AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN

HANDWRITTEN CHEQUES

Filipe Coelho, Luis Batista, Luis F. Teixeira and Jaime S. Cardoso

INESC Porto, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal

Keywords:

Bank cheques, automatic system, handwritten text recognition.

Abstract:

Until the rise of electronic means for direct debit, bank cheques have been used as the best form of payment,

balancing security and ease of use. Its acceptance and generalized use are result of international agreements

that deﬁne rules for ﬁlling and using it. The fast processing of payments and transactions through safer

electronic methods has created the need to reduce its usage over the last years. But despite this progressive

reduction, bank cheques still are and will continue to be used; therefore, there is the need to optimize process-

ing mechanisms. The existing automatic cheque processing systems are proprietary and not adapted to the

Portuguese language, which is crucial for the cheque analysis and recognition. A prototype of an automatic

system for the recognition of the amount in Portuguese bank cheques has been implemented and is being used

as a test platform for improved intelligent character recognition algorithms.

1 INTRODUCTION

Bank cheques are probably the most widespread

type of documents, with nearly one hundred billion

cheques circulating all over the world every year. Re-

tail banks need to assure a prompt answer to these

payment requests, which amount to a signiﬁcant num-

ber each day. Most of them are still processed man-

ually by human operators, with document amount

reading and validation being the most common and

labour-consuming operations.

The Basel II Agreement demanded better secu-

rity procedures and fraud detection mechanisms in or-

der to improve bank cheque processing. Currently,

cheque recognition and validation use a signiﬁcant

part of human resources, due to the multiplicity of

handwriting styles that, although easily recognized

by the human brain, are too difﬁcult for electronic

systems. The processing and manual veriﬁcation of

bank cheques currently require a large investment in

human resources by ﬁnancial institutions. Its automa-

tion may achieve substantial gains of performance and

allow the reallocation of human resources to other

tasks.

The performance of state of the art Optical Char-

acter Recognition (OCR) and Intelligent Character

Recognition (ICR) algorithms (Arica and Yarman-

Vural, 2001) allows the development of systems ca-

pable of recognizing handwritten text in cheques,

speciﬁcally the courtesy and legal amount ﬁelds for

comparison and validation. In fact, there are currently

various solutions on this area. However, most of these

systems are proprietary, managed by ﬁnancial institu-

tions or dedicated companies, which spent years on

its development and ﬁne-tuning to speciﬁc countries

(Gorski et al., 1999; Kaufmann and Bunke, 2000;

Palacios and Gupta, 2002; Guillevic and Suen, 1998).

As such, there is no open source system on this area

adapted to the Portuguese language and handwriting.

This paper describes a complete system for the

automated reading of amounts extracted from Por-

tuguese bank cheques. We present the speciﬁcation

and implementation of a system integrating all the re-

quired features. It uniquely combines ICR technology

in the system, easing the conversion of Portuguese

cheques to a structured, ﬂexible, XML-based format.

The automatic processing of bank cheques is made

tractable only by the contextual constraints offered by

this application.

At the system level, we present the speciﬁcation

of the system architecture and the implementation

of a prototype taking the proposed architecture as a

basis. At the recognition level, we detail the pre-

processing operations necessary for the successful ex-

traction of the legal and courtesy amounts. A compar-

ative study was conducted on the recognition of the

courtesy amount, assessing the strength of different

features and learning algorithms. Finally, we present

320

Coelho F., F. Teixeira L., Batista L. and S. Cardoso J. (2008).

AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES.

In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 320-324

DOI: 10.5220/0001937303200324

 SciTePress

a critical study of speciﬁc techniques to the recogni-

tion of the legal amount, outlining the current line of

investigation.

2 SYSTEM ARCHITECTURE

AND IMPLEMENTATION

In this section, we present the adopted architecture

and technologies used for the system development, as

well as the research made for the amount recognition.

2.1 General Architecture

The system proposed in this paper comprises the cre-

ation of a database of Portuguese cheques and a web-

based application mainly featuring the addition of

Portuguese cheques to the system, performing their

recognition and conversion to a structured format as

XML in an integrated manner, allowing the user to

conﬁrm and correct the conversion results at the last

stage of this process.

The architecture of the proposed system is based on

three main entities, as shown in Figure 1.

Figure 1: Generic system architecture.

The user interacts with the system using the web

application, which allows the complete management

of the cheques and associated metadata, as well as car-

rying out the system administration. The web appli-

cation allows the upload of scanned cheques images

and the examples to design the recognition algorithms

(or improve the performance when deployed). It also

allows the veriﬁcation and correction of the recogni-

tion result for each uploaded cheque. Moreover, ad-

ditional metadata can be inserted and linked to the

cheque.

The processing engine (Figure 2) executes all

the cheque analysis operations, speciﬁcally the pre-

processing, required ﬁelds extraction and amount

recognition, as can be seen in Figure 2.

Figure 2: Processing engine.

The database stores the scanned cheques and the

digital counterpart in XML, as well as all the descrip-

tive metadata inserted by the user and examples used

in the design of the recognition algorithms.

2.2 Prototype Implementation

We developed a prototype taking the system archi-

tecture shown in Figure 1 as a basis. The prototype

was developed on the Microsoft .NET 2.0 platform.

The web application was developed using ASP.NET,

and the processing engine is a C] application run-

ning in background. Both modules use ADO.NET

to access the SQL Express 2005 database. The pre-

processing was implemented with AForge.NET 1.62.

AForge.NET

is an open source platform for the de-

velopment of digital image processing applications on

the .NET platform.

The recognition of the courtesy amount was done

with the Weka

platform, which offers a collection of

machine learning algorithms for solving data mining

problems implemented in Java and open sourced un-

der the GPL. The integration of the Weka platform in

the developed prototype was done using the IKVM

Virtual Machine, which allows Java code conversion

to C] libraries. By making the Weka library directly

available to the system, it was possible to use its im-

plementation of machine learning algorithms.

2.3 Processing Engine

The processing of a cheque combines ﬁrst a set of op-

erations to facilitate the main recognition stage; the

later accounts to the recognition of the courtesy and

legal amounts. The pre-processing involves:

http://code.google.com/p/aforge/

http://www.cs.waikato.ac.nz/ml/weka/

AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES

321

• Noise reduction/elimination:

– median ﬁltering - achieves both noise removal

and edge preservation, by using a 3 × 3 window

and assigning to each pixel the median of the

ordered values;

– contrast stretching - improves the contrast in

the image by ‘stretching’ the range of intensity

values it contains to span the full range of pixel

values. This facilitates the use of a ﬁxed thresh-

old value in the next step;

– binarization - turns the image black and white,

enhancing the cheque limits and orientation for

angle detection.

• Rotation angle detection and correction:

– Principal Components Analysis (PCA) over the

Fourier Transform (FT) of the image (Fig-

ure 3) - the rotation angles results as the slope of

the ﬁrst principal direction of the set of points;

– Rotation with bilinear interpolation - undoes

the rotation angle present in the original im-

age, aligning the cheque boundaries horizon-

tally and vertically.

• Cheque extraction by ﬁnding its bounding box

through horizontal and vertical images projections

(Figure 4).

• Field extraction: Portuguese cheques emitted by

ﬁnancial institutions have their layout completely

standardized, which allows us to ﬁnd and retrieve

its ﬁelds using known coordinates and dimensions

(based on the cheque’s width) (Figure 5).

(a) Original Cheque. (b) Fourier transform.

Figure 3: Angle detection.

2.3.1 Courtesy Amount Recognition

The recognition of the courtesy amount involved ﬁrst

the segmentation of the individual digits composing

the courtesy value. This operation uses an a priori

knowledge about the number of boxes in the courtesy

ﬁeld and ﬁlters the region by eliminating small blobs,

assumed to result from artefacts left by an inaccurate

(a) Rotated Cheque.

(b) Horizontal projection.

Figure 4: Bounding box detection.

(a) Courtesy amount.

(b) Legal amount.

Figure 5: Extracted ﬁelds.

ﬁeld extraction and cleaning. The bounding box of

each digit is then determined by vertical projection

analysis followed by horizontal projection analysis.

This separation avoids cutting digits in half because

of inaccurate thresholding of the courtesy ﬁeld.

To maximize the performance of the classiﬁca-

tion stage, we assessed different set of features and

different families of learning algorithms. As fea-

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

322

tures, we considered to input to classiﬁer the matrix of

grayscale values, the matrix of the binary values (after

thresholding the pixel values), and the digit contour,

represented as the distance of the contour pixel to the

margin of the bounding box (see Figure 6). As classi-

ﬁers, we evaluated both multi-layer-perceptron neural

networks and support vector machines.

(a) Grayscale. (b) Threshold. (c) Contour.

Figure 6: Representation of digits.

2.3.2 Legal Amount Recognition

The support lines in this ﬁeld, necessary for a human

writer, are an obstacle to word segmentation. There-

fore, a ﬁrst step is to remove them. The goal of line re-

moval is to remove the lines as much as possible while

leaving the words on the lines intact. This operation

was accomplished by counting horizontally the num-

ber of white pixels and building its histogram. This

way the module was able to identify the positions of

both lines as maximum on the histogram and elimi-

nate them. The word extraction was then performed

with vertical projection analysis. The result of these

individual steps are illustrated in Figure 7.

(a) Legal ﬁeld after the introduction of noise reduction

ﬁlters and blob ﬁlters.

(b) Legal ﬁeld after the line removal algorithm.

Figure 7: Processment of the legal amount.

Three features were computed for word recogni-

tion: the existence of ascenders and descenders; loop

detection, and ﬁnally the aspect ratio of the word. The

selected features were used to train different classi-

ﬁers: k-nearest neighbour, support vector machines

and multi-layer-perceptron neural networks. Finally,

the improvement of the legal amount recognition is

being researched with support on Hidden Markov

Models (Rabiner, 1990), which have shown to be ro-

bust for word identiﬁcation in handwritten text. The

results obtained on several studies indicate that its use

on the legal amount recognition improves the recog-

nition and validation rates in bank cheques.

3 RESULTS

The results obtained in digit recognition are shown in

Table 1. The same methods used to obtain and recog-

Table 1: Recognition rate for different learning algorithms

and sets of features.

grayscale threshold countour

MLP 88.8 87.6 91.3 89.2

RBF 85.7 70.2 86.3 80.7

Poly SVM 88.8 87.6 92.5 89.6

SVM RBF 89.4 88.2 93.2 90.3

88.2 83.4 90.8

nize the digits in the courtesy ﬁeld were also applied

to the date of issue ﬁeld. Table 2 shows the results

obtained.

Table 2: Recognition rate for the date of issue ﬁeld.

grayscale threshold countour

MLP 78.7 65.8 93.0 79.2

RBF 70.3 75.1 87.3 77.6

Poly SVM 77.7 65.3 92.1 78.4

SVM RBF 81.2 79.3 94.0 84.8

91.6 71.4 77.0

A simple inspection of the results allows us to con-

clude that the features based on the digit contour pro-

duced the best results, followed by grayscale and bi-

narized formats. As for the recognition algorithms,

Support Vector Machines have shown the best perfor-

mance, followed closely by the Multilayer Perceptron

neural network.

The analysis of results for the recognition of the

legal amount, presented in Table 3, shows that the

SVM classiﬁer with RBF has the best results for any

features type. Even though the overall results are in-

sufﬁcient they can be used as a baseline for HMM and

Elastic Matching algorithms, which are both still un-

der development.

4 CONCLUSIONS

The automatic processing of bank cheques is of

paramount importance in the sector. Retail banks

AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES

323

Table 3: Results of the recognition of the legal ﬁeld ob-

tained by the combination of the features with the classi-

ﬁers.

A/D A/D + Loop AD + Loop + Size

MLP 27.4 36.5 37.4 33.8

KNN 25.3 34.3 36.2 31.9

Poly SVM 24.0 38.6 39.3 34.0

SVM RBF 28.6 40.1 41.0 36.6

26.3 37.4 38.5

need to assure a prompt answer to these payment re-

quests, which amount to a signiﬁcant number each

day. Optimizing this decision-making entails the de-

cision to be uniform, objective and fast, with the min-

imum of mistakes and losses. In this work we have

presented an automatic system for the handling of

Portuguese cheques.

A database was created containing digitalized im-

ages of Portuguese cheques for the system training

and validation. Also, a study of pre-processing meth-

ods allowed the correct elimination/attenuation of ex-

isting noise in images, and the successful extraction

of cheques and the necessary ﬁelds. Cheque pre-

processing and ﬁelds extraction were aided by the

cheque layout standardization applied in Portuguese

ﬁnancial institutions, allowing precise localization of

the required ﬁelds.

The results obtained in the machine learning algo-

rithms comparison showed that the courtesy amount

recognition by Support Vector Machines with a RBF

kernel based on digit contour analysis obtained the

best results, with a 93.2% recognition rate.

Future work involves the implementation of a

legal amount recognition module, using Hidden

Markov Models (Rabiner, 1990) and Elastic Match-

ing (Uchida and Sakoe, 2005), which can success-

fully identify the written words, compare it to the pre-

viously recognised courtesy amount and validate the

cheque.

REFERENCES

Arica, N. and Yarman-Vural, F. (May 2001). An overview

of character recognition focused on off-line handwrit-

ing. Systems, Man, and Cybernetics, Part C: Applica-

tions and Reviews, IEEE Transactions on, 31(2):216–

233.

Gorski, N., Anisimov, V., Augustin, E., Baret, O., Price, D.,

and Simon, J.-C. (1999). A2ia check reader: A fam-

ily of bank check recognition systems. In ICDAR ’99:

Proceedings of the Fifth International Conference on

Document Analysis and Recognition, page 523, Wash-

ington, DC, USA. IEEE Computer Society.

Guillevic, D. and Suen, C. (1998). Hmm-knn word recogni-

tion engine for bank cheque processing. In ICPR ’98:

Proceedings of the 14th International Conference on

Pattern Recognition-Volume 2, page 1526, Washing-

ton, DC, USA. IEEE Computer Society.

Kaufmann, G. and Bunke, H. (2000). Automated reading of

cheque amounts. Pattern Anal. Appl., 3(2):132–141.

Palacios, R. and Gupta, A. (2002). A System for Process-

ing Handwritten Bank Checks Automatically. SSRN

eLibrary.

Rabiner, L. R. (1990). A tutorial on hidden markov models

and selected applications in speech recognition. pages

267–296.

Uchida, S. and Sakoe, H. (2005). A Survey of Elas-

tic Matching Techniques for Handwritten Character

Recognition. IEICE Trans Inf Syst, E88-D(8):1781–

1790.

SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications

324