Attention-based Text Recognition in the Wild

Zhi-Chen Yan, Stephanie A. Yu

2020

Abstract

Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.

Download


Paper Citation


in Harvard Style

Yan Z. and Yu S. (2020). Attention-based Text Recognition in the Wild.In Proceedings of the 1st International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA, ISBN 978-989-758-441-1, pages 42-49. DOI: 10.5220/0009970200420049


in Bibtex Style

@conference{delta20,
author={Zhi-Chen Yan and Stephanie Yu},
title={Attention-based Text Recognition in the Wild},
booktitle={Proceedings of the 1st International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA,},
year={2020},
pages={42-49},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009970200420049},
isbn={978-989-758-441-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA,
TI - Attention-based Text Recognition in the Wild
SN - 978-989-758-441-1
AU - Yan Z.
AU - Yu S.
PY - 2020
SP - 42
EP - 49
DO - 10.5220/0009970200420049