Pushing the Limits for View Prediction in Video Coding

Jens Ogniewski, Per-Erik Forssén

Abstract

More and more devices have depth sensors, making RGB+D video data increasingly common. Depth images have also been considered for 3D and free-viewpoint video coding. This depth data can be used to render a given scene from different viewpoints, thus making it a useful asset in e.g. view prediction for video coding. In this paper we evaluate a multitude of algorithms for scattered data interpolation, in order to optimize the performance of frame prediction for video coding. Our evaluation uses the depth extension of the Sintel datasets. Using ground-truth sequences is crucial for such an optimization, as it ensures that all errors and artifacts are caused by the prediction itself rather than noisy or erroneous data. We also present a comparison with the commonly used mesh-based projection.

References

  1. Butler, D., Wulff, J., Stanley, G., and Black, M. (2012). A naturalistic open source movie for optical flow evaluation. In Proceedings of European Conference on Computer Vision, pages 611-625.
  2. Diebel, J. and Thrun, S. (2005). An application of Markov random fields to range sensing. In In NIPS, pages 291-298. MIT Press.
  3. Dong, J., He, Y., and Ye, Y. (2012). Downsampling filter for anchor generation for scalable extensions of hevc. In 99th MPEG meeting.
  4. Iyer, K., Maiti, K., Navathe, B., Kannan, H., and Sharma, A. (2010). Multiview video coding using depth based 3D warping. In Proceedings of IEEE International Conference on Multimedia and Expo, pages 1108-1113.
  5. Kopf, J., Cohen, M. F., Lischinski, D., and Uyttendaele, M. (2007). Joint bilateral upsampling. ACM Transactions on Graphics, 27(3).
  6. Ma, K., Wu, Q., Wang, Z., Duanmu, Z., Yong, H., Li, H., and Zhang, L. (2016). Group mad competition - a new methodology to compare objective image quality models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1664-1673.
  7. Mall, R., Langone, R., and Suykens, J. (2014). Agglomerative hierarchical kernel spectral data clustering. In IEEE Symposium on Computational Intelligence and Data Mining, pages 9-16.
  8. Mark, W. R., McMillan, L., and Bishop, G. (1997). Postrendering 3d warping. In Proceedings of 1997 Symposium on Interactive 3D Graphics, pages 7-16.
  9. Morvan, Y., Farin, D., and de With, P. (2007). Incorporating depth-image based view-prediction into h.264 for multiview-image coding. In Proceedings of IEEE International Conference on Image Processing, volume I, pages 205-208.
  10. Panggabean, M., Tamer, O., and Ronningen, L. (2010). Parallel image transmission and compression using windowed kriging interpolation. In IEEE International Symposium on Signal Processing and Information Technology, pages 315 - 320.
  11. Ringaby, E., Friman, O., Forssén, P.-E., Opsahl, T., Haavardsholm, T., and Ingebjørg K a°sen, I. (2014). Anisotropic scattered data interpolation for pushbroom image rectification.IEEE Transactions in Image Processing, 23(5):2302-2314.
  12. Scalzo, M. and Velipasalar, S. (2014). Agglomerative clustering for feature point grouping. In IEEE International Conference on Image Processing (ICIP), pages 4452 - 4456.
  13. Shimizu, S., Sugimoto, and Kojima, A. (2013). Backward view synthesis prediction using virtual depth map for multiview video plus depth map coding. In Visual Communications and Image Processing (VCIP), pages 1-6.
  14. Solh, M. and Regib, G. A. (2010). Hierarchical holefilling(HHF): Depth image based rendering without depth map filtering for 3D-TV. InIEEE International Workshop on Multimedia and Signal Processing.
  15. Szeliski, R. (2011). Computer Vision: Algorithms and Applications. Springer Verlag London.
  16. Tian, D., Lai, P.-L., Lopez, P., and Gomila, C. (2009). View synthesis techniques for 3D video. In Proceedings of SPIE Applications of Digital Image Processing. SPIE.
  17. Wang, C., Lin, Z., and Chan, S. (2015). Depth map restoration and upsampling for kinect v2 based on ir-depth consistency and joint adaptive kernel regression. In IEEE International Symposium onCircuits and Systems (ISCAS), pages 133-136.
  18. Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Multiscale structural similarity for image quality assessment. In 37th IEEE Asilomar Conference on Signals, Systems and Computers.
  19. Yang, Q., Yang, R., Davis, J., and Nister, D. (2007). Spatialdepth super resolution for range images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1-8.
Download


Paper Citation


in Harvard Style

Ogniewski J. and Forssén P. (2017). Pushing the Limits for View Prediction in Video Coding . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-225-7, pages 68-76. DOI: 10.5220/0006131500680076


in Bibtex Style

@conference{visapp17,
author={Jens Ogniewski and Per-Erik Forssén},
title={Pushing the Limits for View Prediction in Video Coding},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={68-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006131500680076},
isbn={978-989-758-225-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017)
TI - Pushing the Limits for View Prediction in Video Coding
SN - 978-989-758-225-7
AU - Ogniewski J.
AU - Forssén P.
PY - 2017
SP - 68
EP - 76
DO - 10.5220/0006131500680076