A Spatio-temporal Approach for Video Caption Extraction

Liang-Hua Chen, Meng-Chen Hsieh, Chih-Wen Su

Abstract

Captions in videos play an important role for video indexing and retrieval. In this paper, we propose a novel algorithm to extract multilingual captions from video. Our approach is based on the analysis of spatio-temporal slices of video. If the horizontal (or vertical) scan line contains some pixels of caption region then the corresponding spatio-temporal slice will have bar-code like patterns. By integrating the structure information of bar-code like patterns in horizontal and vertical slices, the spatial and temporal positions of video captions can be located accurately. Experimental results show that the proposed algorithm is effective and outperforms some existing techniques.

References

  1. Chen, D., Odobez, J., and Bourlard, H. (2003).Text detection and recognition in images and video frames.Pattern Recognition, 37:595-607.
  2. Gonzalez, A., Bergasa, L., Yebes, J., and Bronte, S. (2012). Text location in complex images. In Proceedings of International Conference on Pattern Recognition, pages 617-620.
  3. Gui, T., Sun, J., Naoi, S., and Katsuyama, Y. (2012). A fast caption detection method for low quality video images. In Proceedings of IAPR International Workshop on Document Analysis Systems, pages 302- 306.
  4. Hua, X., Chen, X., Liu, W., and Zhang, H. (2001).Automatic location of text in video frame. In Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval, pages 24-27.
  5. Huang, H., Shi, P., and Yang, L. (2014).A method of caption location and segmentation in news video. In Proceedings of International Congress on Image and Signal Processing, pages 365-369.
  6. Kim, K., Jung, K., Park, S., and Kim, H. (2001). Support vector machine-based text detection in digital video. Pattern Recognition, 34(2):527-529.
  7. Li, H., Doermann, D., and Kia, O. (2000). Automatic text detection and tracking in digital video. IEEE Transactions on Image Processing, 9(1):147-156.
  8. Lienhart, R. and Wernicke, A. (2002). Localizing and segmenting text in images and videos. IEEE Transactions on Circuits and Systems for Video Technology, 12(4):256-257.
  9. Liu, X., Wang, W., and Zhu, T. (2010). Extracting captions in complex background from video. In Proceedings of International Conference on Pattern Recognition , pages 3232-3235.
  10. Lyu, M., Song, J., and Gai, M. (2005).A comprehensive method for multilingual video text detection, localization, and extraction.IEEE Transactions on Circuits and Systems for Video Technology, 15(2):243-255.
  11. Mariano, V. and Kasturi, R. (2000).Locating uniform colored text in video frames. In Proceedings of International Conference on Pattern Recognition, pages 539-542.
  12. Ngo, C., Pong, T., and Chin, R. (2001). Video partitioning by temporal slice coherency.IEEE Transactions on Circuits and Systems for Video Technology, 11(8):941-953.
  13. Pan, Y., Hou, X., and Liu, C. (2011).A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 20(3):800-813.
  14. Qian, X., Liu, G., Wang, H., and Su, R. (2007). Text detection, localization, and tracking in compressed video. Signal Processing: Image Communication, 22:752- 768.
  15. Shiva kumara, P., Huang, W., and Tan, C. (2008). Efficient video text detection using edge features. In Proceedings of International Conference on Pattern Recognition.
  16. Tang, X., Gao, X., Liu, J., and Zhang, H. (2002). A spatial temporal approach for video caption detection and recognition.IEEE Transactions on Neural Network, 13(4):961-971.
  17. Tasi, T., Chen, Y., and Fang, C. (2006).A comprehensive motion video text detection localization and extraction method. In Proceedings of International Conference on Communications, Circuits and Systems, pages 515- 519.
  18. Wang, R., Jin, W., and Wu, L. (2004). A novel video caption detection approach using multi-frame integration. In Proceedings of International Conference on Pattern Recognition, pages 449-452.
  19. Wang, Y. and Chen, J. (2006). Detecting video text using spatio-temporal wavelet transform. In Proceedings of International Conference on Pattern Recognition, pages 754-757.
  20. Ye, Q., Gao, Q. H. W., and Zhao, D. (2005). Fast and robust text detection in images and video frames. Image and Vision Computing, 23:565-576.
  21. Zhong, Y., Zhang, H., and Jain, A. (2000).Automatic caption localization in compressed video.IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4):385-392.
Download


Paper Citation


in Harvard Style

Chen L., Hsieh M. and Su C. (2016). A Spatio-temporal Approach for Video Caption Extraction . In Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 5: SIGMAP, (ICETE 2016) ISBN 978-989-758-196-0, pages 83-88. DOI: 10.5220/0005939300830088


in Bibtex Style

@conference{sigmap16,
author={Liang-Hua Chen and Meng-Chen Hsieh and Chih-Wen Su},
title={A Spatio-temporal Approach for Video Caption Extraction},
booktitle={Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 5: SIGMAP, (ICETE 2016)},
year={2016},
pages={83-88},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005939300830088},
isbn={978-989-758-196-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 5: SIGMAP, (ICETE 2016)
TI - A Spatio-temporal Approach for Video Caption Extraction
SN - 978-989-758-196-0
AU - Chen L.
AU - Hsieh M.
AU - Su C.
PY - 2016
SP - 83
EP - 88
DO - 10.5220/0005939300830088