Generative vs. Discriminative Deep Belief Netwok for 3D Object Categorization

Nabila Zrira, Mohamed Hannat, El Houssine Bouyakhf, Haris Ahmad khan

2017

Abstract

Object categorization has been an important task of computer vision research in recent years. In this paper, we propose a new approach for representing and learning 3D object categories. First, We extract the Viewpoint Feature Histogram (VFH) descriptor from point clouds and then we learn the resulting features using deep learning architectures. We evaluate the performance of both generative and discriminative deep belief network architectures (GDBN/DDBN) for object categorization task. GDBN trains a sequence of Restricted Boltzmann Machines (RBMs) while DDBN uses a new deep architecture based on RBMs and the joint density model. Our results show the power of discriminative model for object categorization and outperform state-of-the-art approaches when tested on the Washington RGBD dataset.

References

  1. Aldoma, A., Tombari, F., Rusu, R., and Vincze, M. (2012). OUR-CVFH-Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation. Springer.
  2. Aldoma, A., Vincze, M., Blodow, N., Gossow, D., Gedikli, S., Rusu, R., and Bradski, G. (2011). Cad-model recognition and 6dof pose estimation using 3d cues. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 585-592. IEEE.
  3. Alexandre, L. A. (2016). 3d object recognition using convolutional neural networks with transfer learning between input channels. In Intelligent Autonomous Systems 13, pages 889-898. Springer.
  4. Azevedo, F. A., Carvalho, L. R., Grinberg, L. T., Farfel, J. M., Ferretti, R. E., Leite, R. E., Lent, R., HerculanoHouzel, S., et al. (2009). Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. Journal of Comparative Neurology, 513(5):532-541.
  5. Basu, J. K., Bhattacharyya, D., and Kim, T.-h. (2010). Use of artificial neural network in pattern recognition. International Journal of Software Engineering and Its Applications, 4(2).
  6. Bo, L., Ren, X., and Fox, D. (2011). Depth kernel descriptors for object recognition. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 821-826. IEEE.
  7. Carreira-Perpinan, M. A. and Hinton, G. E. (2005). On contrastive divergence learning. In Proceedings of the tenth international workshop on artificial intelligence and statistics, pages 33-40. Citeseer.
  8. Deng, L. (2014). A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3:e2.
  9. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006a). A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527-1554.
  10. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006b). A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527-1554.
  11. Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., and Darrell, T. (2013). A category-level 3d object dataset: Putting the kinect to work. In Consumer Depth Cameras for Computer Vision, pages 141-165. Springer.
  12. Keyvanrad, M. A. and Homayounpour, M. M. (2014). Deep belief network training improvement using elite samples minimizing free energy. arXiv preprint arXiv:1411.4046.
  13. Lai, K., Bo, L., Ren, X., and Fox, D. (2011a). A largescale hierarchical multi-view rgb-d object dataset. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 1817-1824. IEEE.
  14. Lai, K., Bo, L., Ren, X., and Fox, D. (2011b). A largescale hierarchical multi-view rgb-d object dataset. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 1817-1824. IEEE.
  15. Larochelle, H. and Bengio, Y. (2008). Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th international conference on Machine learning, pages 536-543. ACM.
  16. LeCun, Y., Huang, F. J., and Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II-97. IEEE.
  17. Liu, Y., Zhou, S., and Chen, Q. (2011). Discriminative deep belief networks for visual data classification. Pattern Recognition, 44(10):2287-2296.
  18. Madry, M., Ek, C. H., Detry, R., Hang, K., and Kragic, D. (2012). Improving generalization for 3d object categorization with global structure histograms. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 1379-1386. IEEE.
  19. McCann, S. and Lowe, D. G. (2012). Local naive bayes nearest neighbor for image classification. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3650-3656. IEEE.
  20. Nair, V. and Hinton, G. E. (2009). 3d object recognition with deep belief nets. In Advances in Neural Information Processing Systems, pages 1339-1347.
  21. Ouadiay, F. Z., Zrira, N., Bouyakhf, E. H., and Himmi, M. M. (2016). 3d object categorization and recognition based on deep belief networks and point clouds. In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics, pages 311-318.
  22. Rumelbart, D. and McClelland, J. (1986). Parallel distributed processing: Explorations in the microstuctures of cognition.
  23. Rusu, R., Blodow, N., Marton, Z., and Beetz, M. Aligning point cloud views using persistent feature histograms. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 3384- 3391.
  24. Rusu, R. and Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China.
  25. Rusu, R. B., Blodow, N., and Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In Robotics and Automation, 2009. ICRA'09. IEEE International Conference on, pages 3212-3217. IEEE.
  26. Rusu, R. B., Bradski, G., Thibaux, R., and Hsu, J. (2010). Fast 3D recognition and pose using the viewpoint feature histogram. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 2155-2162. IEEE.
  27. Schwarz, M., Schulz, H., and Behnke, S. (2015). Rgbd object recognition and pose estimation based on pre-trained convolutional neural network features. In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages 1329-1335. IEEE.
  28. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., and Poggio, T. (2007). A quantitative theory of immediate visual recognition. Progress in brain research, 165:33-56.
  29. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory.
  30. Socher, R., Huval, B., Bath, B., Manning, C. D., and Ng, A. Y. (2012). Convolutional-recursive deep learning for 3d object classification. In Advances in Neural Information Processing Systems, pages 665-673.
  31. Tieleman, T. (2008). Training restricted boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pages 1064-1071. ACM.
  32. Toldo, R., Castellani, U., and Fusiello, A. (2009). A bag of words approach for 3d object categorization. In Computer Vision/Computer Graphics CollaborationTechniques, pages 116-127. Springer.
  33. Tombari, F., Salti, S., and D. Stefano, L. (2010). Unique signatures of histograms for local surface description. In Computer Vision-ECCV 2010, pages 356- 369. Springer.
  34. Tombari, F., Salti, S., and Stefano, L. (2011). A combined texture-shape descriptor for enhanced 3d feature matching. In Image Processing (ICIP), 2011 18th IEEE International Conference on, pages 809-812. IEEE.
  35. Torralba, A., Murphy, K. P., Freeman, W. T., and Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 273-280. IEEE.
  36. Wohlkinger, W. and Vincze, M. (2011). Ensemble of shape functions for 3d object classification. In Robotics and Biomimetics (ROBIO), 2011 IEEE International Conference on, pages 2987-2992. IEEE.
  37. Zhang, H., Berg, A. C., Maire, M., and Malik, J. (2006). Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2, pages 2126-2136. IEEE.
  38. Zhou, S., Chen, Q., and Wang, X. (2010). Discriminative deep belief networks for image classification. In 2010 IEEE International Conference on Image Processing, pages 1561-1564. IEEE.
Download


Paper Citation


in Harvard Style

Zrira N., Hannat M., Bouyakhf E. and Ahmad khan H. (2017). Generative vs. Discriminative Deep Belief Netwok for 3D Object Categorization . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 98-107. DOI: 10.5220/0006151100980107


in Bibtex Style

@conference{visapp17,
author={Nabila Zrira and Mohamed Hannat and El Houssine Bouyakhf and Haris Ahmad khan},
title={Generative vs. Discriminative Deep Belief Netwok for 3D Object Categorization},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={98-107},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006151100980107},
isbn={978-989-758-226-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Generative vs. Discriminative Deep Belief Netwok for 3D Object Categorization
SN - 978-989-758-226-4
AU - Zrira N.
AU - Hannat M.
AU - Bouyakhf E.
AU - Ahmad khan H.
PY - 2017
SP - 98
EP - 107
DO - 10.5220/0006151100980107