AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN
HANDWRITTEN CHEQUES
Filipe Coelho, Luis Batista, Luis F. Teixeira and Jaime S. Cardoso
INESC Porto, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
Keywords:
Bank cheques, automatic system, handwritten text recognition.
Abstract:
Until the rise of electronic means for direct debit, bank cheques have been used as the best form of payment,
balancing security and ease of use. Its acceptance and generalized use are result of international agreements
that define rules for filling and using it. The fast processing of payments and transactions through safer
electronic methods has created the need to reduce its usage over the last years. But despite this progressive
reduction, bank cheques still are and will continue to be used; therefore, there is the need to optimize process-
ing mechanisms. The existing automatic cheque processing systems are proprietary and not adapted to the
Portuguese language, which is crucial for the cheque analysis and recognition. A prototype of an automatic
system for the recognition of the amount in Portuguese bank cheques has been implemented and is being used
as a test platform for improved intelligent character recognition algorithms.
1 INTRODUCTION
Bank cheques are probably the most widespread
type of documents, with nearly one hundred billion
cheques circulating all over the world every year. Re-
tail banks need to assure a prompt answer to these
payment requests, which amount to a significant num-
ber each day. Most of them are still processed man-
ually by human operators, with document amount
reading and validation being the most common and
labour-consuming operations.
The Basel II Agreement demanded better secu-
rity procedures and fraud detection mechanisms in or-
der to improve bank cheque processing. Currently,
cheque recognition and validation use a significant
part of human resources, due to the multiplicity of
handwriting styles that, although easily recognized
by the human brain, are too difficult for electronic
systems. The processing and manual verification of
bank cheques currently require a large investment in
human resources by financial institutions. Its automa-
tion may achieve substantial gains of performance and
allow the reallocation of human resources to other
tasks.
The performance of state of the art Optical Char-
acter Recognition (OCR) and Intelligent Character
Recognition (ICR) algorithms (Arica and Yarman-
Vural, 2001) allows the development of systems ca-
pable of recognizing handwritten text in cheques,
specifically the courtesy and legal amount fields for
comparison and validation. In fact, there are currently
various solutions on this area. However, most of these
systems are proprietary, managed by financial institu-
tions or dedicated companies, which spent years on
its development and fine-tuning to specific countries
(Gorski et al., 1999; Kaufmann and Bunke, 2000;
Palacios and Gupta, 2002; Guillevic and Suen, 1998).
As such, there is no open source system on this area
adapted to the Portuguese language and handwriting.
This paper describes a complete system for the
automated reading of amounts extracted from Por-
tuguese bank cheques. We present the specification
and implementation of a system integrating all the re-
quired features. It uniquely combines ICR technology
in the system, easing the conversion of Portuguese
cheques to a structured, flexible, XML-based format.
The automatic processing of bank cheques is made
tractable only by the contextual constraints offered by
this application.
At the system level, we present the specification
of the system architecture and the implementation
of a prototype taking the proposed architecture as a
basis. At the recognition level, we detail the pre-
processing operations necessary for the successful ex-
traction of the legal and courtesy amounts. A compar-
ative study was conducted on the recognition of the
courtesy amount, assessing the strength of different
features and learning algorithms. Finally, we present
320
Coelho F., F. Teixeira L., Batista L. and S. Cardoso J. (2008).
AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES.
In Proceedings of the International Conference on Signal Processing and Multimedia Applications, pages 320-324
DOI: 10.5220/0001937303200324
Copyright
c
SciTePress
a critical study of specific techniques to the recogni-
tion of the legal amount, outlining the current line of
investigation.
2 SYSTEM ARCHITECTURE
AND IMPLEMENTATION
In this section, we present the adopted architecture
and technologies used for the system development, as
well as the research made for the amount recognition.
2.1 General Architecture
The system proposed in this paper comprises the cre-
ation of a database of Portuguese cheques and a web-
based application mainly featuring the addition of
Portuguese cheques to the system, performing their
recognition and conversion to a structured format as
XML in an integrated manner, allowing the user to
confirm and correct the conversion results at the last
stage of this process.
The architecture of the proposed system is based on
three main entities, as shown in Figure 1.
Figure 1: Generic system architecture.
The user interacts with the system using the web
application, which allows the complete management
of the cheques and associated metadata, as well as car-
rying out the system administration. The web appli-
cation allows the upload of scanned cheques images
and the examples to design the recognition algorithms
(or improve the performance when deployed). It also
allows the verification and correction of the recogni-
tion result for each uploaded cheque. Moreover, ad-
ditional metadata can be inserted and linked to the
cheque.
The processing engine (Figure 2) executes all
the cheque analysis operations, specifically the pre-
processing, required fields extraction and amount
recognition, as can be seen in Figure 2.
Figure 2: Processing engine.
The database stores the scanned cheques and the
digital counterpart in XML, as well as all the descrip-
tive metadata inserted by the user and examples used
in the design of the recognition algorithms.
2.2 Prototype Implementation
We developed a prototype taking the system archi-
tecture shown in Figure 1 as a basis. The prototype
was developed on the Microsoft .NET 2.0 platform.
The web application was developed using ASP.NET,
and the processing engine is a C] application run-
ning in background. Both modules use ADO.NET
to access the SQL Express 2005 database. The pre-
processing was implemented with AForge.NET 1.62.
AForge.NET
1
is an open source platform for the de-
velopment of digital image processing applications on
the .NET platform.
The recognition of the courtesy amount was done
with the Weka
2
platform, which offers a collection of
machine learning algorithms for solving data mining
problems implemented in Java and open sourced un-
der the GPL. The integration of the Weka platform in
the developed prototype was done using the IKVM
Virtual Machine, which allows Java code conversion
to C] libraries. By making the Weka library directly
available to the system, it was possible to use its im-
plementation of machine learning algorithms.
2.3 Processing Engine
The processing of a cheque combines first a set of op-
erations to facilitate the main recognition stage; the
later accounts to the recognition of the courtesy and
legal amounts. The pre-processing involves:
1
http://code.google.com/p/aforge/
2
http://www.cs.waikato.ac.nz/ml/weka/
AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES
321
Noise reduction/elimination:
median filtering - achieves both noise removal
and edge preservation, by using a 3 × 3 window
and assigning to each pixel the median of the
ordered values;
contrast stretching - improves the contrast in
the image by ‘stretching’ the range of intensity
values it contains to span the full range of pixel
values. This facilitates the use of a fixed thresh-
old value in the next step;
binarization - turns the image black and white,
enhancing the cheque limits and orientation for
angle detection.
Rotation angle detection and correction:
Principal Components Analysis (PCA) over the
Fourier Transform (FT) of the image (Fig-
ure 3) - the rotation angles results as the slope of
the first principal direction of the set of points;
Rotation with bilinear interpolation - undoes
the rotation angle present in the original im-
age, aligning the cheque boundaries horizon-
tally and vertically.
Cheque extraction by finding its bounding box
through horizontal and vertical images projections
(Figure 4).
Field extraction: Portuguese cheques emitted by
financial institutions have their layout completely
standardized, which allows us to find and retrieve
its fields using known coordinates and dimensions
(based on the cheque’s width) (Figure 5).
(a) Original Cheque. (b) Fourier transform.
Figure 3: Angle detection.
2.3.1 Courtesy Amount Recognition
The recognition of the courtesy amount involved first
the segmentation of the individual digits composing
the courtesy value. This operation uses an a priori
knowledge about the number of boxes in the courtesy
field and filters the region by eliminating small blobs,
assumed to result from artefacts left by an inaccurate
(a) Rotated Cheque.
(b) Horizontal projection.
(c) Vertical projection.
Figure 4: Bounding box detection.
(a) Courtesy amount.
(b) Legal amount.
Figure 5: Extracted fields.
field extraction and cleaning. The bounding box of
each digit is then determined by vertical projection
analysis followed by horizontal projection analysis.
This separation avoids cutting digits in half because
of inaccurate thresholding of the courtesy field.
To maximize the performance of the classifica-
tion stage, we assessed different set of features and
different families of learning algorithms. As fea-
SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications
322
tures, we considered to input to classifier the matrix of
grayscale values, the matrix of the binary values (after
thresholding the pixel values), and the digit contour,
represented as the distance of the contour pixel to the
margin of the bounding box (see Figure 6). As classi-
fiers, we evaluated both multi-layer-perceptron neural
networks and support vector machines.
(a) Grayscale. (b) Threshold. (c) Contour.
Figure 6: Representation of digits.
2.3.2 Legal Amount Recognition
The support lines in this field, necessary for a human
writer, are an obstacle to word segmentation. There-
fore, a first step is to remove them. The goal of line re-
moval is to remove the lines as much as possible while
leaving the words on the lines intact. This operation
was accomplished by counting horizontally the num-
ber of white pixels and building its histogram. This
way the module was able to identify the positions of
both lines as maximum on the histogram and elimi-
nate them. The word extraction was then performed
with vertical projection analysis. The result of these
individual steps are illustrated in Figure 7.
(a) Legal field after the introduction of noise reduction
filters and blob filters.
(b) Legal field after the line removal algorithm.
(c) Word obtained after vertical projection.
Figure 7: Processment of the legal amount.
Three features were computed for word recogni-
tion: the existence of ascenders and descenders; loop
detection, and finally the aspect ratio of the word. The
selected features were used to train different classi-
fiers: k-nearest neighbour, support vector machines
and multi-layer-perceptron neural networks. Finally,
the improvement of the legal amount recognition is
being researched with support on Hidden Markov
Models (Rabiner, 1990), which have shown to be ro-
bust for word identification in handwritten text. The
results obtained on several studies indicate that its use
on the legal amount recognition improves the recog-
nition and validation rates in bank cheques.
3 RESULTS
The results obtained in digit recognition are shown in
Table 1. The same methods used to obtain and recog-
Table 1: Recognition rate for different learning algorithms
and sets of features.
grayscale threshold countour
MLP 88.8 87.6 91.3 89.2
RBF 85.7 70.2 86.3 80.7
Poly SVM 88.8 87.6 92.5 89.6
SVM RBF 89.4 88.2 93.2 90.3
88.2 83.4 90.8
nize the digits in the courtesy field were also applied
to the date of issue field. Table 2 shows the results
obtained.
Table 2: Recognition rate for the date of issue field.
grayscale threshold countour
MLP 78.7 65.8 93.0 79.2
RBF 70.3 75.1 87.3 77.6
Poly SVM 77.7 65.3 92.1 78.4
SVM RBF 81.2 79.3 94.0 84.8
91.6 71.4 77.0
A simple inspection of the results allows us to con-
clude that the features based on the digit contour pro-
duced the best results, followed by grayscale and bi-
narized formats. As for the recognition algorithms,
Support Vector Machines have shown the best perfor-
mance, followed closely by the Multilayer Perceptron
neural network.
The analysis of results for the recognition of the
legal amount, presented in Table 3, shows that the
SVM classifier with RBF has the best results for any
features type. Even though the overall results are in-
sufficient they can be used as a baseline for HMM and
Elastic Matching algorithms, which are both still un-
der development.
4 CONCLUSIONS
The automatic processing of bank cheques is of
paramount importance in the sector. Retail banks
AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES
323
Table 3: Results of the recognition of the legal field ob-
tained by the combination of the features with the classi-
fiers.
A/D A/D + Loop AD + Loop + Size
MLP 27.4 36.5 37.4 33.8
KNN 25.3 34.3 36.2 31.9
Poly SVM 24.0 38.6 39.3 34.0
SVM RBF 28.6 40.1 41.0 36.6
26.3 37.4 38.5
need to assure a prompt answer to these payment re-
quests, which amount to a significant number each
day. Optimizing this decision-making entails the de-
cision to be uniform, objective and fast, with the min-
imum of mistakes and losses. In this work we have
presented an automatic system for the handling of
Portuguese cheques.
A database was created containing digitalized im-
ages of Portuguese cheques for the system training
and validation. Also, a study of pre-processing meth-
ods allowed the correct elimination/attenuation of ex-
isting noise in images, and the successful extraction
of cheques and the necessary fields. Cheque pre-
processing and fields extraction were aided by the
cheque layout standardization applied in Portuguese
financial institutions, allowing precise localization of
the required fields.
The results obtained in the machine learning algo-
rithms comparison showed that the courtesy amount
recognition by Support Vector Machines with a RBF
kernel based on digit contour analysis obtained the
best results, with a 93.2% recognition rate.
Future work involves the implementation of a
legal amount recognition module, using Hidden
Markov Models (Rabiner, 1990) and Elastic Match-
ing (Uchida and Sakoe, 2005), which can success-
fully identify the written words, compare it to the pre-
viously recognised courtesy amount and validate the
cheque.
REFERENCES
Arica, N. and Yarman-Vural, F. (May 2001). An overview
of character recognition focused on off-line handwrit-
ing. Systems, Man, and Cybernetics, Part C: Applica-
tions and Reviews, IEEE Transactions on, 31(2):216–
233.
Gorski, N., Anisimov, V., Augustin, E., Baret, O., Price, D.,
and Simon, J.-C. (1999). A2ia check reader: A fam-
ily of bank check recognition systems. In ICDAR ’99:
Proceedings of the Fifth International Conference on
Document Analysis and Recognition, page 523, Wash-
ington, DC, USA. IEEE Computer Society.
Guillevic, D. and Suen, C. (1998). Hmm-knn word recogni-
tion engine for bank cheque processing. In ICPR ’98:
Proceedings of the 14th International Conference on
Pattern Recognition-Volume 2, page 1526, Washing-
ton, DC, USA. IEEE Computer Society.
Kaufmann, G. and Bunke, H. (2000). Automated reading of
cheque amounts. Pattern Anal. Appl., 3(2):132–141.
Palacios, R. and Gupta, A. (2002). A System for Process-
ing Handwritten Bank Checks Automatically. SSRN
eLibrary.
Rabiner, L. R. (1990). A tutorial on hidden markov models
and selected applications in speech recognition. pages
267–296.
Uchida, S. and Sakoe, H. (2005). A Survey of Elas-
tic Matching Techniques for Handwritten Character
Recognition. IEICE Trans Inf Syst, E88-D(8):1781–
1790.
SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications
324