Fast Scene Text Detection with RT-LoG Operator and CNN

Dinh Nguyen, Mathieu Delalandre, Donatello Conte, The Pham


Text detection in scene images is of particular importance for the computer-based applications. The text detection methods must be robust against variabilities and deformations of text entities. In addition, to be embedded into mobile devices, the methods have to be time efficient. In this paper, the keypoint grouping method is proposed by first applying the real-time Laplacian of Gaussian operator (RT-LoG) to detect keypoints. These keypoints will be grouped to produce the character patterns. The patterns will be filtered out by using a CNN model before aggregating into words. Performance evaluation is discussed on the ICDAR2017 RRC-MLT and the Challenge 4 of ICDAR2015 datasets. The results are given in terms of detection accuracy and time processing against different end-to-end systems in the literature. Our system performs as one of the strongest detection accuracy while supporting at approximately 15.6 frames per second to the HD resolution on a regular CPU architecture. It is one of the best candidates to guarantee the trade-off between accuracy and speed in the literature.


Paper Citation