INCORPORATING A NEW RELATIONAL FEATURE IN ARABIC

ONLINE HANDWRITTEN CHARACTER RECOGNITION

Sara Izadi and Ching Y. Suen

Centre for Pattern Recognition and Machine Intelligence, 1455 de Maisonneuve Blvd. West, Montreal, Canada

Keywords:

Feature extraction, online handwriting, neural network, recognition.

Abstract:

Artiﬁcial neural networks have shown good performance in classiﬁcation tasks. However, models used for

learning in pattern classiﬁcation are challenged when the differences between the patterns of the training set

are small. Therefore, the choice of effective features is mandatory for obtaining good performance. Statistical

and geometrical features alone are not suitable for recognition of hand printed characters due to variations in

writing styles that may result in deformations of character shapes. We address this problem by using a rela-

tional context feature combined with a local descriptor for training a neural network-based recognition system

in a user-independent online character recognition application. Our feature extraction approach provides a rich

representation of the global shape characteristics, in a considerably compact form. This new relational feature

provides a higher distinctiveness and increases robustness with respect to character deformations. While en-

hancing the recognition accuracy, the feature extraction is computationally simple. We show that the ability

to discriminate in Arabic handwriting characters is increased by adopting this mechanism in feed forward

neural network architecture. Our experiments on Arabic character recognition show comparable results with

the state-of-the-art methods for online recognition of these characters.

1 INTRODUCTION

For a variety of devices such as hand-held computers,

smart phones and personal digital assistants (PDAs),

the integrated online handwriting recognition inter-

face is a useful complement to keyboard-based data

entry. Variation (the unique way of writing of each

individual) and Variability (the changes of a speciﬁc

individual in producing different samples in different

times) are the main challenges for automatic hand-

writing recognition regardless of the type of appli-

cation or device. Both improving the recognition

performance and extending the number of supported

languages for those devices have attracted increasing

research interests in the ﬁeld of online handwriting

recognition.

The writing environment in online recognition

systems is the surface of a digitizer instead of a piece

of paper. The trace of the pen tip is recorded as a

sequence of points, sampled in equally-spaced time

intervals (x(t),y(t)). Any sequence of points from a

pen-down state to the next pen-up state is called a

stroke. Dynamic and temporal information in on-

line systems can cause more ambiguityby introducing

more inter-class variability. Variations in the number

of strokes and the order of strokes make the recogni-

tion task more complex than in off-line case. Recog-

nition of scripts with complex characters shapes and

a large number of symbols makes it even more dif-

ﬁcult. Arabic online handwriting recognition is less

explored, due to cursive characteristics of the sym-

bols. Besides having huge variations in different peo-

ple’s writings, there are a lot of similarities between

different letters in the alphabet. The Arabic alpha-

bet consists of 28 letters, in 17 main shapes. These

shapes are shared between some letters. Figure 1

shows 17 representative shapes from each class of

very similar characters when dots are ignored. Dot(s)

can appear above, below or inside a letter in Ara-

bic. Various classiﬁcation methods have been previ-

ously studied for online systems, such as neural net-

works (Verma et al., 2004), support vector machines

(Bahlmann et al., 2002), and structural matchings on

trees and chains (Oh and Geiger, 2000). However,

selecting useful features is more crucial to both learn-

ing and recognition than the choice of the classiﬁca-

tion method. In this paper, we adopt a global feature

inspired by shape context representation (Belongie

et al., 2002) for similarity measures between images.

This method may introduce high dimensional feature

vectors in general. However, we investigate ways to

adopt this feature for online handwriting applications

559

Izadi S. and Y. Suen C. (2008).

INCORPORATING A NEW RELATIONAL FEATURE IN ARABIC ONLINE HANDWRITTEN CHARACTER RECOGNITION.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 559-562

DOI: 10.5220/0001087005590562

 SciTePress

Figure 1: The 17 groups of Arabic isolated characters and

their assigned class numbers.

efﬁciently. The proposed technique is computation-

ally simple and provides compact, yet representative,

feature vectors. Our empirical evaluation for recog-

nition of isolated online Arabic characters illustrates

comparable results with corresponding state-of-the-

art.

2 RELATIONAL APPROACH FOR

FEATURE REPRESENTATION

Features can be broadly divided into two categories:

local, and global. To compute a global feature, the

whole sequence of trajectory points is used. Local

features are computed by only considering the trajec-

tory points in a certain vicinity of a point. Global fea-

tures may provide more descriptive information about

a character than local features. However, these fea-

tures usually come with a high computational cost.

The choice of effective features is mandatory for

reaching a good performance. Our model does not use

local features, however, it does use a temporal order-

ing of observed points. It allows for spatio-temporal

data representation and augments the visual realism

of the shape of a character. Such contextual model-

ing is especially used in shape context representation

methods in machine vision literature.

2.1 Preprocessing

In order to reduce the noise introduced by the digi-

tizing device, we need to preprocess the data. The

preprocessing operations applied for our recognition

system includes three steps: ﬁrst smoothing, then de-

hooking, and ﬁnally point re-sampling. Slight shakes

of the hand is the source of the noise that the smooth-

ing operation tries to remove. If each point on the

main stroke is expressed as (x

) in the Cartesian

system, then the trajectory is smoothed by a weighted

averaging (1D discrete Gaussian ﬁltering) of each

point and its two immediate right and left neighbors:

) = 1/4(x

i−1

) + 1/2(x

) + 1/4(x

i+1

)

(1)

Figure 2 depicts the Arabic letter “Dal” in its original

recorded form and after applying preprocessing oper-

ations. The effect of the smoothing operation is illus-

trated in Figure 2(b). In online handwriting data, a

hook-shaped noise often occurs at the start or the end

of strokes. Figure 2(a) shows a starting hook. Hooks

appear due to digitalizing device inaccuracy in the de-

tection of a pen-down event, or due to fast hand mo-

tions when positioning the pen on, or lifting it off of

the writing surface. Elimination of this noise is called

de-hooking. In this research, for de-hooking, the head

or tail of a stroke is considered a hook if its length

is short compared to the length of the stroke, and

the direction of the writing undergoes a sharp angu-

lar change. The (x,y) coordinates obtained from the

graphic tablet are originally equidistant in time. How-

ever, the distribution of points along the trajectory can

be uneven due to variations in the writing speed (see

Figure 2(a)). The re-sampling of the data produces

points which are equi-distant in space instead of time.

This operation is used for either down-sampling or

up-sampling, when fewer computations or more data

points are desired respectively. Figures 2(d) and 2(e)

show examples of re-sampling. We also apply size

normalization in the preprocessing stage in order for

all characters to have the same heights, while their

original aspect ratios remain unchanged.

(a) (b) (c) (d)

(e)

Figure 2: The Arabic letter “Dal” in its (a) original form,

and its preprocessed versions when (b)smoothing (c)de-

hooking, (d)down-sampling and (e)up-sampling have been

applied.

2.2 Feature Extraction

Shape context is a shape descriptor proposed for mea-

suring similarities in the context of shape matching

(Belongie et al., 2002). Given that shapes are repre-

sented by a set of points sampled from their contours,

the shape context represents a coarse distribution of

the rest of the shape with respect to a given point in

the shape. For a shape with N boundary points, a

set of N − 1 vectors originating from each boundary

point to all other sample points expresses the conﬁg-

uration of the entire shape relative to that point. A

coarse histogram of the relative coordinates of those

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

560

remaining N − 1 points is deﬁned to be the shape con-

text of that particular point. The representation and

its matching method is computationally expensive. In

applications like online handwriting recognition, all

computations need to happen in real time. Therefor,

shape context in its original form cannot be applied

directly. Our proposed feature is some adaptation of

this idea. First, we introduce some notations. We

denote the points character trajectory by P and the

set of reference points by R. Our feature selection

method is explained in Algorithm 1 which selects ad-

equate features of relational type. We call this fea-

ture RF. In the ﬁrst part of this algorithm, we select

an arbitrary and ﬁxed set R. This set of points must

not be outside the bounding box. The points in R

may capture some interrelationship structures (for in-

stance, symmetrical corners of the bounding box), or

R may contain only one point). The algorithm sam-

ples a set of points S which is equi-distanced from

the normalized representation of the character, P. The

surrounding area of each reference point that falls in

the bounding box is divided into bins according to the

distance and the angle with respect to the reference

point. The values of all the bins are initialized to zero

for each reference point. Then, all the points in S are

described from the view of each reference point and

are placed into the corresponding bins. This is done

by computing the distance and angle between the pair

of points, dist(r, p) and angle(r, p), and updating the

corresponding bin that this pair can be mapped on.

After this step, the number of points in the bins pro-

vided by all the reference points will give a compact

representation of the character.

Algorithm 1 Relational Feature Extraction.

INPUT: A set of re-sampled trajectory points S

OUTPUT: Feature vector V

Select an arbitrary set of reference points R

Select an arbitrary set of r-bins: r

,...,r

and θ-bins:

,θ

,...,θ

for all r ∈ R do

Initialize(r − bins,θ− bins)

for all s ∈ S do

Compute dist(r,s)

Compute angle(r, s)

Assign(r− bins,θ − bins,r,s)

U pdate(r − bins,θ− bins)

end for

V = count(r − bins,θ− bins,r)

Return V

In our experiments with a variety of point sets,

we noticed that the geometrical center of each charac-

ter bounding box provides better recognition results,

while it keeps the size of the feature vector more man-

ageable. Therefore, we used this point as reference

point in a log-polar coordinate system.

Figure 3: Diagram of the log-polar bins around the center

of the bounding box used for relational context feature com-

putation.

Using a log-polar coordinate system makes the de-

scriptor more sensitive to differencesin nearby points.

Figure 3 shows the log-polar histogram bins for com-

puting the feature of the Arabic character for letter

S, pronounced as “Sin”. The center of the circles is

located at the center of the bounding box of the nor-

malized letter. The template for extracting the context

feature for the shown diagram has 5 bins in the tan-

gential direction and 12 bins in the radial direction,

yielding a feature vector of length 60. We capture the

global characteristics of the character by this feature.

We also use a directional feature to extract the local

writing directions. The tangent along the character

trajectory is calculated as following:

Arctan((x

− x

i−1

) + j(y

− y

i−1

)) for i = 2 : N, (2)

3 RECOGNITION SYSTEM

In this paper, we have used artiﬁcial neural networks

(ANNs) as our learning classiﬁer method. Our rela-

tional feature magniﬁes the differences between sim-

ilar characters and improves learning in ANNs. We

train a Multi-Layer Perceptrons (MLPs) through a

conjugate gradient method for classiﬁcation using

three-layer network with 100 nodes in the hidden

layer.

4 EMPIRICAL RESULTS

We used a data set of isolated Arabic letters, courtesy

of INRS-EMT Vision Group. In this database, the

Arabic letters are divided into 17 classes according to

their main shapes (Figure 1). This database was pro-

duced by the contribution of 18 writers with a large

variety in samples (see (Mezghani et al., 2005) for de-

tails). A combination of two separately trained Koho-

nen Neural Networks in the voting scheme was previ-

ously used in the literature for recognizing online iso-

lated Arabic characters (Mezghani et al., 2003). We

evaluated the representational power of RF by train-

ing the network performing 6-fold cross-validation

INCORPORATING A NEW RELATIONAL FEATURE IN ARABIC ONLINE HANDWRITTEN CHARACTER

RECOGNITION

561

for 10 times. Previous studies (Mezghani et al., 2003;

Mezghani et al., 2005) tested on this database, used

288 and 144 samples of each class for training and

testing, respectively. We have used the same 2 out

of 3 ratio between the number of training and testing

samples to make our results more comparable to those

previously reported. We aimed for more conﬁdence in

the reported recognition rate by following the experi-

mental set up as explained over ten runs reported the

statistics on recognition results with 95% conﬁdence

interval. This total recognition is 95.2± 0.12.

Table 1: Performance comparison of different recognition

systems.

Performance 1-NearestNeighbor Kohonen memory RF-Tangent

Recognition Rate Ref -1.19 + 4.22

Training Time - 2 hrs 7.5 min

Recognition Time 526 s 26 s 23 s

Table 2: Confusion matrix for a sample run using the com-

bination of relational and directional features.

class Recog

label 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 rate%

1 143 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 99.31

2 0 137 0 0 0 1 1 0 0 1 0 0 0 4 0 0 0 95.14

3 0 0 144 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100.00

4 0 0 0 130 0 0 0 0 0 1 0 12 0 1 0 0 0 90.28

5 0 2 0 1 140 0 0 0 0 0 0 1 0 0 0 0 0 97.22

6 0 1 0 1 0 137 0 0 0 2 0 0 1 0 0 0 2 95.14

7 0 1 0 0 0 1 142 0 0 0 0 0 0 0 0 0 0 98.61

8 0 0 0 0 0 0 0 144 0 0 0 0 0 0 0 0 0 100.00

9 0 0 0 0 0 0 0 0 141 0 0 0 1 0 1 0 1 97.92

10 0 6 0 0 1 2 1 0 0 133 0 0 0 0 0 0 1 92.36

11 0 0 0 0 0 0 0 0 0 0 144 0 0 0 0 0 0 100.00

12 0 0 0 7 0 0 0 0 0 1 0 133 0 3 0 0 0 92.36

13 0 0 0 0 0 1 0 0 1 0 0 0 139 0 0 3 0 96.54

14 0 0 0 3 0 0 0 0 0 0 0 2 0 132 3 0 4 91.67

15 0 0 0 0 0 1 1 0 0 0 1 0 1 1 139 0 0 96.53

16 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 141 0 97.92

17 0 1 0 0 0 0 0 0 0 1 0 0 0 2 5 0 135 93.75

total

rate 96.16

Table 1 summarizes the results. The ﬁrst column

presents our experiments with the 1-Nearest Neighbor

classiﬁer as this classiﬁer is often used as a bench-

mark. Its asymptotic error rate is less than twice the

optimal Bayes rate. However, its asymptotic through-

put is zero. Second column shows the results reported

in (Mezghani et al., 2005), and the last column shows

the results of our experiments with feed forward MLP

neural network classiﬁer equipped with the RF fea-

ture and the tangent feature, described in Section 3,

conducted on Pentium(R)D with 340-MHz CPU. In

the ﬁrst row, the recognition rate of all methods pre-

sented. The recognition rate for 1-Nearest Neigh-

bor classiﬁer is considered as the reference point for

this comparison. Our method shows 4.22% improve-

ment, on average, compare to the one achieved by

1-Nearest Neighbor classiﬁer. This is while the re-

sults in (Mezghani et al., 2005) in the best case can

only reach 1.19% less than 1-Nearest Neighbor clas-

siﬁer. The training and recognition times presented

in the second and third rows of Table 1. Although

the results reported in (Mezghani et al., 2005) are

based on slightly faster machine than ours, recogni-

tion is faster by our method and the training time is

signiﬁcantly shorter than the ones in (Mezghani et al.,

2005). The confusion matrix of a sample run is pre-

sented in Table 2. The recognition rate for different

classes vary. On average, our recognition rate for dif-

ferent letters was more robust than the best ones re-

ported in (Mezghani et al., 2005). This conﬁrms the

high discrimination ability of the feature vectors in-

troduced in our recognition system.

5 CONCLUSIONS

We introduced relational feature for online handwrit-

ing recognition and showed the usefulness of this fea-

ture in Arabic character recognition. Our experiments

suggest that this feature representation improves the

state-of-the-art recognition performance. This repre-

sentation provides a rich enough representational fea-

ture for the global shape of these characters. A com-

bination of this global feature, and the local tangent

feature which captures the temporal information of

the online data, improves the recognition rate com-

pared for the same database. In future, we intend to

investigate the use of these features in other super-

vised learning techniques and also for online word

recognition.

REFERENCES

Bahlmann, C., Haasdonk, B., and Burkhardt, H. (2002).

On-line handwriting recognition with support vector

machines – a kernel approach.

Belongie, S., Malik, J., and Puzicha, J. (2002). Shape

matching and object recognition using shape contexts.

IEEE Trans. Pattern Anal. Mach. Intell., 24(4):509–

522.

Mezghani, N., Cheriet, M., and Mitiche, A. (2003). Com-

bination of pruned kohonen maps for on-line aracic

characters recognition.

Mezghani, N., Mitiche, A., and Cheriet, M. (2005). A new

representation of shape and its use for high perfor-

mance in online arabic character recognition by an as-

sociative memory. IJDAR, 7(4):201–210.

Oh, J. and Geiger, D. (2000). An on-line handwriting recog-

nition system using ﬁsher segmental matching and hy-

potheses propagation network. IEEE Conference on

Computer Vision and Pattern Recognition, 2:343 –

348.

Verma, B., Lu, J., Ghosh, M., and Ghosh, R. (2004). A

feature extraction technique for online handwriting

recognition. IEEE International Joint Conference on

Neural Networks, 2:1337 – 1341.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

562