A Tracking Approach for Text Line Segmentation in Handwritten

Documents

Insaf Setitra, Zineb Hadjadj and Abdelkrim Meziane

Research Center on Scientiﬁc and Technical Information, CERIST, Algiers, Algeria

{isetitra, zhadjadj, ameziane}@cerist.dz

Keywords:

Tracking, Connected Components, Angle Minimization, Segmentation, Line, Handwritten, Prediction.

Abstract:

Tracking of objects in videos consists of giving a label to the same object moving in different frames. This

labelling is performed by predicting position of the object given its set of features observed in previous frames.

In this work, we apply the same rationale by considering each connected component in the manuscript as a

moving object and to track it so that to minimize the distance and angle of of the connected component to

its nearest neighbour. The approach was applied to images of ICDAR 2013 handwritten segmentation contest

and proved to be robust against text orientation, size and writing script.

1 INTRODUCTION

Text line segmentation is considered a non-trivial task

to solve the ﬁeld of handwritten document recog-

nition (Stamatopoulos et al., 2013). By and large,

many challenges can occur when segmenting hand-

written document images such as skew in text lines

and adjacency of text lines. To solve some of the

text line segmentation challenges, literature count a

variety of text line segmentation techniques. Li et al.

(Li et al., 2008) proposed an approach for handwrit-

ten text line segmentation using level sets. Goto and

Aso (Goto and Aso, 1999) proposed a local linearity

based method to detect text lines in English and Chi-

nese documents. In the method proposed by Hones

and Litcher (H

ones and Lichter, 1994), text lines are

generated by expanding the line anchors of the doc-

ument image. The previously cited methods cannot

handle variable sized text, which is the main draw-

back.

Roy et al. (Roy et al., 2012) proposed text line

extraction using foreground and background infor-

mations. Louloudis et al. (Louloudis et al., 2007)

used a block-Based Hough Transform for text line

extraction. In the method proposed by Loo and Tan

(Loo and Tan, 2002) the irregular pyramids are used

for text line segmentation. Recently, Bukhari et al.

(Bukhari et al., 2008) proposed a line segmentation

approach for camera-based warped documents using

active contour models. Gatos et al. (Gatos et al.,

2007) proposed an algorithm based on text line and

word detection for warped documents. Bai et al. (Bai,

2008) used a traditional perceptual grouping-based al-

gorithm for extracting curved lines. Pal and Roy (Pal

and Roy, 2004) proposed a head-line based technique

for multi-oriented (printed in several orientations) and

curved text lines extraction from Indian documents.

In other work, Pal et al. (Pal et al., 2003) developed

a system for English multi-oriented text line extrac-

tion estimating the equation of the text line from the

character information.

Although cited approaches ware competitive, they

still lack universality and the problem of text lines

especially in curved document remains open. This

paper describes a new approach inspired of tracking

works to detect lines in handwritten document im-

ages.

Basically, tracking is the process of following ob-

jects through an image sequence (Mitiche and Ag-

garwal, 2014). The earliest methods were focussed

on following the trajectory of a few feature points

through the sequence. Examples include Kalman ﬁl-

ter (Bar-Shalom, 1987) (Broida and Chellappa, 1986)

(Boykov and Huttenlocher, 2000) and are applied in

areas such as in (Li et al., 2010) and (Zheng et al.,

2012).

In our approach, each cluster of connected pixels

(which can be a word or a part of a word) is consid-

ered as a moving object and it seeks for its best match

which satisﬁes a trajectory angle. Position of the clus-

ter (deﬁned as a connected component in the remain-

ing of the paper) is predicted using prevision position

and angle between previous and current observation

must be minimized in order to have a best match of

Setitra, I., Hadjadj, Z. and Meziane, A.

A Tracking Approach for Text Line Segmentation in Handwritten Documents.

DOI: 10.5220/0006199001930198

In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 193-198

ISBN: 978-989-758-222-6

193

the current cluster. Clusters are tracked and matched

pair to pair until the end of the document and lines

are result of that matching. We beneﬁt from tracking

rationale where we predict position of clusters accord-

ing to their history, and avoid tracking issues since we

do not count on any feature of the cluster which is ob-

vious since clusters (words or parts of words in the

handwritten document image) need not necessarily to

be similar in shape or be a weak deformation of their

best matches.

We explain more deeply our approach in section 2.

We give a preliminary set of tests and criticize them

in section 3 and conclude in section 4.

2 OUR APPROACH

Input of our algorithm are binary manuscript images,

the ﬁrst step is then not to segment images, but in-

stead, extract directly connected components that will

be tracked over the manuscript image. Connected

components are in our case sets of pixels that have

a 4 connectivity link. We also remove small regions

of connected components since they can false our line

segmentation. Examples of such regions include dots

(dark pink dots in ﬁgure 1.). As a result of this ﬁrst

step, we have a set of connected components each rep-

resented by row and column indices of the connected

component center (yellow dots in ﬁgure 1.). In or-

der to group each set of connected components into a

line, we ﬁrst perform a binary matching of each pair

of connected components so that each pair is in the

same line. Connected components are assumed to be

in the same line if they are spatially close to each other

and if the angle they produce with their origin is be-

low a certain threshold.

More speciﬁcally, Let X = {x

, x

, ...x

} be a set

of connected component centers s.t. x









is a

2−dimensional vector representing index of line and

index of column of center x

Let also D(x

, x

) be the Euclidean distance between

and x

where i ranges from 1 to n and k is an random

value chosen from 1 to n. D(x

, x

) is computed as

follows:

D(x

, x

) =

− x

)

+ (x

− x

)

(1)

By sorting D(x

, x

) and taking the ﬁrst N points

which satisfy this sorting, we will have a subset of X

with N elements each representing a connected com-

ponents the closest to x

Le Y = y

, y

, ..., y

be the set of N closest con-

nected components to x

. Geometrically, this set can

Figure 1: Example of connected component analysis for

line segmentation. Red region: connected component to

be matched. Gray circle: region of interest which is the set

of connected components nearest to connected component

to be matched (red region) centred at it. The region of in-

terest include small connected components to be removed

(pink regions) and connected component to be considered

for comparisons and matching (blue regions). Black regions

are connected components far from the connected compo-

nent to be matched (red region) and are not considered for

comparisons and matching. Yellow dots are centres of con-

nected components. Note that centres of connected com-

ponents are put only for visualization and are not effective

centres returned by our implementation.

be seen as a polygon centered at x

and having at most

N − 1 vertices; number of vertices can be less than

N − 1 because some vertices can be aligned. From

the set Y , we would like to get the best match of x

one of y

, i.e. we would like to know if x

and y

are

aligned and how close they are. We do this by com-

puting the inner angle between x

, y

and the origin

If x

is the ﬁrst element of X , then, x

has coordi-

nates x

and x

= x

+1. This means that x

is aligned

to x

and y

i∗

(the best match to x

) must minimize the

inner angle with the horizon. We choose x

aligned to

because, for the ﬁrst element, we suppose that the

writing in the manuscript is horizontal. We treat the

case where the writing takes another direction (say up

to the right or down to the right) by taking the his-

tory of Y later on. We explain more speciﬁcally how

we compute and minimize the angle between x

and

as follow. We keep the same notation as previously

and use (without justiﬁcation) some basic notions of

geometry.

The inner angle between x

, x

and y

centered at

can be derived as follows.

First, the vector between x

and x

is computed

using:

−−→

= x

− x

(2)

−−→









−









(3)

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

194

Figure 2: Example of

computation; the inner angle

between x

, x

and y

centred at x

Then, the vector between x

and y

is computed using:

−→

= x

− y

(4)

−→









−









(5)

The dot product of

−−→

and

−→

is given by

−−→

−→

−−→

−→

k cosθ (6)

where

is the inner angle between x

cenered

at x

and k

−−→

k is length of the vector

−−→

and is

computed as:

−−→

− x

)

+ (x

− x

)

(7)

−→

k is computed the same way.

From equation 6, we can have the angle θ as follows:

θ = arcos





−−→

−→

−−→

−→





(8)

Figure 2 shows a general case of computation of

angle between two the three points (

). In the

case of the ﬁgure, x

is not the horizontal axis but

instead an arbitrary point in the space.

Once angles between x

and all y

computed, the

next step is to choose the best match of x

to one of y

that minimizes the angle computed previously as fol-

lows: While x

and x

are ﬁxed for the point x

, the

only variable is y

. Let then θ in equation 8 be equal

to f (y

). Minimization of the angle (θ = f (y

)) is then

subject to y

and is performed as follows:

argmin

∈Y

f (y

) := {y

∗

∈ Y ∧∀y

∈ Y : f (y

∗

) ≤ f (y

)}

(9)

Figure 3: Example of angle minimization when x

is the

ﬁrst connected component center in X. Black regions are

connected components and yellow regions are centres of

each connected component.

Figure 3 shows and example of this minimization. In

the ﬁgure y

was chosen as the best match to x

since

(the angle

) was smaller than θ

(the angle

In a more general case, we choose y

∗

that min-

imizes the angle between the origin x

and current

point to match x

. A further reﬁnement is done where

if the angle constituted by x

and y

∗

with the origin

is higher than a certain threshold ε

, the matching is

rejected. Note that, this threshold is initialized to +∞.

We update this ε

after we accept the ﬁrst match.

Once the ﬁrst match of x

to y

∗

is chosen, update

of ε

is necessary. ε

in this step will be the angle

∗

). Update of threshold is necessary because

if we accept the minimum angle without threshold-

ing, then we can match a connected component in

the end of the line to a connected component of an-

other line. Thresholding allows stopping tracking of

the connected component especially when the line in

the manuscript ends.

At this point, the matching will result in a

3−dimensional vector where ﬁrst dimension is x

second is y

∗

and third dimension is

∗

When x

is not the ﬁrst point to be matched, then,

to get the history of x

we look at the ﬁrst previous

neighbour of x

which is x

k−1

The process starting from the choice of k in x

the angle minimization step is repeated to all points in

For remaining points in X, one obvious obser-

vation is that, remaining points will have a history,

i.e. at least one previous point in X has already been

matched. This is crucial to know the writing style in

the manuscript. For example, if the writing in the

manuscript was from going from up to down of the

page, then, the origin x

will not be the horizontal line,

A Tracking Approach for Text Line Segmentation in Handwritten Documents

195

Figure 4: Example of angle minimization when x

is not

the ﬁrst connected component center in X (angles have his-

tory). Black regions are connected components and yellow

regions are centres of each connected component. The best

match of x

is given to y

but will instead, will be calculated as follows:

Let x

be the point to be matched and x

−1 be the ﬁrst

previous neighbour of x

. To compute the hostory of

writing i.e. the previous angle, we need to compute x

coordinate. coordinates of x

as computed as follows:

∂x =

− x

k−1

k − (k − 1)

= x

− x

k−1

(10)

∂y =

− y

k−1

k − (k − 1)

= y

− y

k−1

(11)

where ∂x and ∂x are displacement from x

k−1

to x

the horizontal axis and vertical axis respectively.

coordinates can then be computed as follows:

(

= x

+ ∂x

= x

+ ∂y

(12)

where x

and x

are the row and column indices of x

respectively.

This computation is important since we would like

to keep history of writing which is based in the pre-

vious matching. This follows the principle of dif-

ferent types of tracking especially the Kalman ﬁlter

[ref] which keeps history of previous observations.

Note that our approach is different from tracking ap-

proaches as we do not keep a whole history but only

the previous observation of angle which is most simi-

lar to Hidden Markov models approaches.

Once x

coordinates computed, the same process

is repeated to get the best match of x

. Figure 4 shows

an example of x

computation and getting the best

match according to the angle history. In the ﬁgure,

the best match would be given to y

instead of y

al-

though the angle between x

, y

and the x − axis is

bigger than the one between x

, y

and the x − axis,

but since the matching follows same pattern as previ-

ous matching, the best match is given to y

3 EXPERIMENTAL RESULTS

We tested our approach on images of ICDAR 2013

Handwriting Segmentation Contest, (Stamatopoulos

et al., 2013). The dataset consists of 150 document

images written in English and Greek as well as 50

images written in Bangla along with the associated

ground truth for training and 50 images written in En-

glish, 50 images written in Greek and 50 images writ-

ten in Bangla for test (Stamatopoulos et al., 2013).

The dataset is challenging in that the skew angle be-

tween text lines and within the same line is different.

We implemented our approach in matlab 2010

and we choose in our approach the angle threshold

= 0.2

, number of neighbours of the connected

component to track as N = 20 and minimum size of

the connected components as 200 pixels.

In the experimental phase, we draw a blue line

between each pair of centres of matched connected

components found in section 2. When lines are cu-

mulated, they show clearly direction of motion of

words (connected components that we matched pre-

viously). Although visual observation can prove ro-

bustness of the approach, quantitative analysis is nec-

essary to validate the approach and extend it to other

datasets. We did not include the quantitative analysis

in this work because our approach links only pairs of

lines. If we apply the software proposed in the contest

as described in section II and III of (Stamatopoulos

et al., 2013) and in (con, ) we would have a low accu-

racy. This is because each pair of connected compo-

nents would be considered as a line. In order to solve

this issue we propose two solutions: either to cluster

blue lines (between matched connected components)

so that each cluster constitute a line in the handwrit-

ten document, or to track, among all pairs, the ﬁrst

connected component so that to have a complete tra-

jectory of it in the line. We leave this improvement

to a future work while we present here only the main

steps of the approach and a preliminary result.

Figure 5 shows examples of application of our ap-

proach to images of the contest. Although images are

in different orientations, our approach can still de-

tect lines in the handwritten documents. However,

two main drawbacks can be observed in the approach;

ﬁrst, several connected components were ignored in

the processing, those components are the one smaller

than the threshold size deﬁned as 200 pixels previ-

ously. The second drawback can be observed in the

two last handwritten lines of ﬁgure 5.(b). In the ﬁgure,

we can observe some blue lines from the two hand-

written lines merged. Since some components have

their centres further from center of the word they be-

long to, and due to their previous observation, they

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

196

(a)

(b)

(c)

(d)

Figure 5: Result of our approach on images of ICDAR 2013

Handwriting Segmentation Contest. (a) result of process-

ing image 202.tif, (b) result of processing image 337.tif,(c)

result of processing image 342.tif,(d) result of processing

image 343.tif.

are matched wrongly. One possible solution can be

to enhance resolution of the original image so that

each line keeps its own connected components close

to each other.

4 CONCLUSION

In this work, we presented a new approach for hand-

written text line segmentation inspired of various

tracking approaches. The aim of the approach is to

track each pair of connected components in the hand-

written document which satisfy angle minimization.

The approach is suitable when when connected com-

ponents in the handwritten document are close to each

other independently of the skew and line orientation.

However, the approach can fail when connected com-

ponents from different lines are close. Inthis case,

they are merged to same line since the most avail-

able information used in our approach is the center

of the connected component. The approach gave ac-

ceptable visual results but need to be enhanced with a

complete tracking so that a quantitative analysis can

be performed. The approach being innovative can be

enhanced and open a new way of detecting lines in

handwritten documents using word tracking.

ACKNOWLEDGEMENT

Authors would like to thank program chairs of the

International Document Image Processing Summer

school (IDIPS 2015) (IDI, ) for their introduction to

the ICDAR 2013 Handwriting Segmentation Contest

during the summer school and for their pertinent ex-

planations and remarks.

REFERENCES

http://users.iit.demokritos.gr/∼nstam/

ICDAR2013HandSegmCont.

http://samosweb.aegean.gr/idips2015/.

Bai, N. N. (2008). Extracting curved text lines using

the chain composition and the expanded grouping

method.

Bar-Shalom, Y. (1987). Tracking and Data Association.

Academic Press Professional, Inc., San Diego, CA,

USA.

Boykov, Y. and Huttenlocher, D. P. (2000). Adaptive

bayesian recognition in tracking rigid objects. In

Computer Vision and Pattern Recognition, 2000. Pro-

ceedings. IEEE Conference on, volume 2, pages 697–

704 vol.2.

A Tracking Approach for Text Line Segmentation in Handwritten Documents

197

Broida, T. J. and Chellappa, R. (1986). Estimation of object

motion parameters from noisy images. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

PAMI-8(1):90–99.

Bukhari, S. S., Shafait, F., and Breuel, T. M. (2008). Seg-

mentation of curled textlines using active contours.

In Document Analysis Systems, 2008. DAS ’08. The

Eighth IAPR International Workshop on, pages 270–

277.

Gatos, B., Pratikakis, I., and Ntirogiannis, K. (2007). Seg-

mentation based recovery of arbitrarily warped docu-

ment images. In Ninth International Conference on

Document Analysis and Recognition (ICDAR 2007),

volume 2, pages 989–993.

Goto, H. and Aso, H. (1999). Extracting curved text

lines using local linearity of the text line. Interna-

tional Journal on Document Analysis and Recogni-

tion, 2(2):111–119.

ones, F. and Lichter, J. (1994). Layout extraction of mixed

mode documents. Machine Vision and Applications,

7(4):237–246.

Li, X., Wang, K., Wang, W., and Li, Y. (2010). A multiple

object tracking method using kalman ﬁlter. In Infor-

mation and Automation (ICIA), 2010 IEEE Interna-

tional Conference on, pages 1862–1866.

Li, Y., Zheng, Y., Doermann, D., Jaeger, S., and Li, Y.

(2008). Script-independent text line segmentation

in freestyle handwritten documents. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

30(8):1313–1329.

Loo, P. K. and Tan, C. L. (2002). Document Analysis

Systems V: 5th International Workshop, DAS 2002

Princeton, NJ, USA, August 19–21, 2002 Proceedings,

chapter Word and Sentence Extraction Using Irregular

Pyramid, pages 307–318. Springer Berlin Heidelberg,

Berlin, Heidelberg.

Louloudis, G., Gatos, B., and Halatsis, C. (2007). Text line

detection in unconstrained handwritten documents us-

ing a block-based hough transform approach. In Pro-

ceedings of the Ninth International Conference on

Document Analysis and Recognition - Volume 02, IC-

DAR ’07, pages 599–603, Washington, DC, USA.

IEEE Computer Society.

Mitiche, A. and Aggarwal, J. (2014). Computer Vision

Analysis of Image Motion by Variational Methodsn.

Springer International Publishing.

Pal, U. and Roy, P. P. (2004). Multioriented and curved

text lines extraction from indian documents. IEEE

Transactions on Systems, Man, and Cybernetics, Part

B (Cybernetics), 34(4):1676–1684.

Pal, U., Sinha, S., and Chaudhuri, B. B. (2003). Image

Analysis: 13th Scandinavian Conference, SCIA 2003

Halmstad, Sweden, June 29 – July 2, 2003 Proceed-

ings, chapter Multi-oriented English Text Line Iden-

tiﬁcation, pages 1146–1153. Springer Berlin Heidel-

berg, Berlin, Heidelberg.

Roy, P. P., Pal, U., and Llad

os, J. (2012). Text line extraction

in graphical documents using background and fore-

ground information. International Journal on Docu-

ment Analysis and Recognition (IJDAR), 15(3):227–

241.

Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., and

Alaei, A. (2013). Icdar 2013 handwriting segmenta-

tion contest. In 2013 12th International Conference

on Document Analysis and Recognition, pages 1402–

1406.

Zheng, B., Xu, X., Dai, Y., and Lu, Y. (2012). Ob-

ject tracking algorithm based on combination of dy-

namic template matching and kalman ﬁlter. In In-

telligent Human-Machine Systems and Cybernetics

(IHMSC), 2012 4th International Conference on, vol-

ume 2, pages 136–139.

ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods

198