Deep Learning with Sparse Prior - Application to Text Detection in the Wild

Adleni Mallek; Fadoua Drira; Rim Walha; Adel M. Alimi; Frank LeBourgeois

doi:10.5220/0006129102430250

Deep Learning with Sparse Prior - Application to Text Detection in the Wild

Adleni Mallek, Fadoua Drira, Rim Walha, Adel M. Alimi, Frank LeBourgeois

2017

Abstract

Text detection in the wild remains a very challenging task in computer vision. According to the state-of-the-art, no text detector system, robust whatever the circumstances, exists up to date. For instance, the complexity and the diversity of degradations in natural scenes make traditional text detection methods very limited and inefficient. Recent studies reveal the performance of texture-based approaches especially including deep models. Indeed, the main strengthens of these models is the availability of a learning framework coupling feature extraction and classifier. Therefore, this study focuses on developing a new texture-based approach for text detection that takes advantage of deep learning models. In particular, we investigate sparse prior in the structure of PCANet; the convolution neural network known for its simplicity and rapidity and based on a cascaded principal component analysis (PCA). The added-value of the sparse coding is the representation of each feature map via coupled dictionaries to migrate from one level-resolution to an adequate lower-resolution. The specificity of the dictionary is the use of oriented patterns well-suited for textual pattern description. The experimental study performed on the standard benchmark, ICDAR 2003, proves that the proposed method achieves very promising results.

References

Anthimopoulos, M., Gatos, B., and Pratikakis, I. (2013). Detection of artificial and scene text in images and video frames. Pattern Anal. Appl., 16(3):431-446.
Chan, T. H., Jia, K., Gao, S., Lu, J., Zeng, Z., and Ma, Y. (2014). Pcanet: A simple deep learning baseline for image classification. IEEE Trans. on Image Processing, 24(12):5017-5032.
Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., and Schmidhuber, J. (2011). High performance neural networks for visual object classication. Technical Report IDSIA-01-11.
Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. IEEE Computer Vision and Pattern Recognition, pages 2963-2970.
Gao, R., Uchida, S., Shahab, A., Shafait, F., and Frinken, V. (2014). Visual saliency models for text detection in real world. PLoS ONE, 9:114-539.
Garcia, C. and Apostolidis, X. (2000). Text detection and segmentation in complex color images. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 2326-2329.
Girshick, R., Donahue, J., Darrell, T., , and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Computer Vision and Pattern Recognition.
Gupta, A., Vedaldi, A., , and Zisserman, A. (2016). Synthetic data for text localisation in natural image. IEEE Computer Vision and Pattern Recognition.
He, T., Huang, W., Qiao, Y., and Yao, J. (2016). Textattentional convolutional neural network for scene text detection. IEEE Trans. Image Processing, pages 2529-2541.
Huang, W., Lin, Z., Yang, J., and Wang, J. (2013). Text localization in natural images using stroke feature transform and text covariance descriptors. IEEE Int. Conf. on Computer Vision, pages 1241-1248.
Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Deep features for text spotting. European Conf. on Computer Vision.
Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems.
Lee, J.-J., Lee, P.-H., Lee, S.-W., Yuille, A., and Koch, C. (2011). Adaboost for text detection in natural scene. In Int. Conf. on Document Analysis and Recognition, pages 429-434.
Lienhart, R. and Wernicke, A. (2002). Localizing and segmenting text in images and videos. IEEE Trans. on Circuits and Systems for Video Technology, 12:256- 268.
Neumann, L. and Matas, J. (2012). Real-time scene text localization and recognition. IEEE Computer Vision and Pattern Recognition, pages 3538-3545.
Neumann, L. and Matas, J. (2013). Scene text localization and recognition with oriented stroke detection. IEEE Int. Conf. on Computer Vision, pages 97-104.
Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. Int. Conf. on Learning Representation.
Socher, R., Pennington, J., Huang, E., Ng, A., and Manning, C. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Conf. on Empirical Methods in Natural Language Processing, pages 151-161.
Srinivas, S., Sarvadevabhatla, R., Mopuri, K., Prabhu, N., Kruthiventi, S., and Radhakrishnan, V. (2016). A taxonomy of deep convolutional neural nets for computer vision. Frontiers in Robotics and AI.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., and Reed, S. (2015). Going deeper with convolutions. Computer Vision and Pattern Recognition.
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi, A. (2014). Sparse coding with a coupled dictionary learning approach for textual image super-resolution. Int. Conf. on Pattern Recognition, pages 4459-4464.
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., and Alimi, A. (2015). Resolution enhancement of textual images via multiple coupled dictionaries and adaptive sparse representation selection. Int. Journal of Document Analysis and Recognition, 18(1):87-107.
Wang, K., Babenko, B., and Belongie, S. (2011). End-toend scene text recognition. In Int. Conf. on Computer Vision, pages 1457-1464.
Wang, T., Wu, D. J., Coates, A., and Ng, A. Y. (2012). Endto-end text recognition with convolutional neural networks. Int. Conf. on Pattern Recognition, pages 3304- 3308.
Yang, J., Wright, J., Huang, T., and Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Trans. Image Process, 19(11):2861-2873.
Ye, Q. and Doermann, D. (2015). Text detection and recognition in imagery: A survey. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37(7):1480-1500.
Yi, C. and Tian, Y. (2011). string detection from natural scenes by structure-based partition and grouping. IEEE Trans. on Image Processing, 20(9):2594-2605.
Zhong, Y., Karu, K., and Jain, A. (1995). Locating text in complex color images. Pattern Recognition, pages 1523-1536.

Download

Paper Citation

in Harvard Style

Mallek A., Drira F., Walha R., Alimi A. and LeBourgeois F. (2017). Deep Learning with Sparse Prior - Application to Text Detection in the Wild . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 243-250. DOI: 10.5220/0006129102430250

in Bibtex Style

@conference{visapp17,
author={Adleni Mallek and Fadoua Drira and Rim Walha and Adel M. Alimi and Frank LeBourgeois},
title={Deep Learning with Sparse Prior - Application to Text Detection in the Wild},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={243-250},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006129102430250},
isbn={978-989-758-226-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Deep Learning with Sparse Prior - Application to Text Detection in the Wild
SN - 978-989-758-226-4
AU - Mallek A.
AU - Drira F.
AU - Walha R.
AU - Alimi A.
AU - LeBourgeois F.
PY - 2017
SP - 243
EP - 250
DO - 10.5220/0006129102430250