Authors:
P. Saragiotis
and
N. Papamarkos
Affiliation:
Democritus University of Thrace, Greece
Keyword(s):
Skew Correction, Text Area Localization, Connected Component Analysis, Linear Regression, Optical Character Recognition.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Enhancement and Restoration
;
Image and Video Analysis
;
Image Formation and Preprocessing
;
Segmentation and Grouping
Abstract:
In this paper we propose a technique for detecting and correcting the skew of text areas in a document. The documents we work with may contain several areas of text with different skew angles. In the first stage, a text localization procedure is applied based on connected components analysis. Specifically, the connected components of the document are extracted and filtered according to their size and geometric characteristics. Next, the candidate characters are grouped using a nearest neighbour approach to form words, in a first step, and then text lines of any skew, in a second step. Using linear regression, two lines are estimated for each text line representing its top and bottom boundaries. The text lines in near locations with similar skew angles are grown to form text areas. These text areas are rotated independently to a horizontal or vertical plane. This technique has been tested and proved efficient and robust on a wide variety of documents including spreadsheets, book and m
agazine covers and advertisements.
(More)