RRConvNet: Recursive-residual Network for Real-life Character Image

Recognition

Tadele Mengiste

, Birhanu Hailu Belay

, Bezawork Tilahun

, Tsiyon Worku

and Tesfa Tegegne

1,2

Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia

ICT4D Research Center, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia

Keywords:

Ethiopic Character Image, OCR, Pattern Recognition, Recursive-CNN, Skip-connection.

Abstract:

Variations in fonts, styles, and ways to write a character have been the major bottlenecks in OCR research.

Such problems are swiftly tackled through advancements in deep neural networks (DNNs). However, the

number of network parameters and feature reusability are still the issues when applying Deep Convolutional

Neural networks(DCNNs) for character image recognition. To address these challenges, in this paper, we

propose an extensible and recursive-residual ConvNet architecture (RRConvNet) for real-life character image

recognition. Unlike the standard DCCNs, RRConvNet incorporates two extensions: recursive-supervision

and skip-connection. To enhance the recognition performance and reduce the number of parameters for extra

convolutions, layers of up to three recursions are proposed. Feature maps are used after each recursion for

reconstructing the target character. For all recursions of the reconstruction method, the reconstruction layers

are the same. The second enhancement is to use a short skip-connection from the input to the reconstruction

output layer to reuse the character features maps that are already learned from the prior layer. This skip-

connection could be also used as an alternative path for gradients where the gradient is too small. With an

overall character recognition accuracy of 98.2 percent, the proposed method achieves a state-of-the-art result

on both publicly available and private test datasets.

1 INTRODUCTION

Nowadays, it is becoming increasingly important to

have documents in a digital format for easily access-

ing information, efﬁcient data storage, and retrieval.

For example, if a manuscript was published 100 years

ago, it is quite impossible to have text for this an-

cient manuscript in an editable document such as a

word or text ﬁle. So, the only choice that remains

is to type the entire text which is a very exhaustive

process if the text is large. The solution to this prob-

lem is optical character recognition. The use and ap-

plications of Optical Character Recognition (OCR)

systems have been developed and widely applied for

the digitization of many documents written in various

scripts (Elleuch et al., 2016).

Optical Character Recognition (OCR) is the pro-

cess of extracting text from an image handwritten

or machine-printed documents. A single page of

the sample Ethiopic script is illustrated in Figure 1.

OCR has been and is widely used for many scripts

as a method of digitizing printed texts which can

be electronically edited, searched, stored more com-

pactly, displayed online, and also used to facilitate the

human-to-machine and machine-to-machine commu-

nication such as machine translation, text-to-speech,

key data, and text mining (Belay et al., 2019b).

Figure 1: Sample Ethiopic script image.

Optical character recognition is a strenuous ﬁeld

of research that requires great effort and researchers

have been exploring different strategies for about the

110

Mengiste, T., Belay, B., Tilahun, B., Worku, T. and Tegegne, T.

RRConvNet: Recursive-residual Network for Real-life Character Image Recognition.

DOI: 10.5220/0011270400003277

In Proceedings of the 3rd International Conference on Deep Learning Theory and Applications (DeLTA 2022), pages 110-116

ISBN: 978-989-758-584-5; ISSN: 2184-9277

past half a century. Recently, deep neural networks

have drawn the observance of many researchers due

to their competency in ﬁguring out computer vision

problems such as object detection, classiﬁcation, and

recognition undoubtedly well (Bai et al., 2014). CNN

is one of the most prominent types of deep neural net-

works, it can learn and extract features from images.

The CNN classiﬁer can effectively recognize charac-

ters located in the image.

As Kim (Kim et al., 2016) presented in detail, us-

ing the standard deep CNN architecture (Bora et al.,

2020), for character recognition substantially boosts

the number of parameters and needs more data to pre-

vent over-ﬁtting. Important hyper-parameters such as

the degree of parameter sharing, number of layers,

units per layer, and the overall number of parame-

ters must be selected manually through trial-and-error

(Eigen et al., 2013). In very deep neural networks,

the gradient becomes too small when we approach

the earlier layers. Thus, we will not update the ear-

lier layers since the gradient becomes zero (Tan and

Lim, 2019; Dai and Heckel, 2019). In such standard

network architectures, there is also low-level informa-

tion shared between the input and output layers (He

et al., 2016).

To address this, we have introduced a Recursive-

residual Convent (RRConvNet) for real-life Ethiopic

character image recognition. Sample Ethiopic script

is shown in Figure 1. This method consists of two ap-

proaches that are used to ease the difﬁculty of train-

ing. First, all recursions are supervised. Feature maps

after each recursion are used to reconstruct the tar-

get character. The reconstruction method (layers ded-

icated to reconstruction) is the same for all recursions.

As each recursion leads to a different character pre-

diction, all predictions resulting from different levels

of recursions that deliver a more accurate ﬁnal pre-

diction are combined. A recursive neural network is

a kind of deep neural network created after applying

the same set of weights recursively to the structured

inputs. Finally, a structured prediction over variable-

size input structures or a scalar prediction on it is pro-

duced by traversing a given structure in topological

order.

The second extension is to use a skip-connection

from the input to the reconstruction layer. In the ex-

periment, the input to the layers for output reconstruc-

tion have explicitly connected. This is especially ef-

fectual when input and output are tremendously cor-

related. It utilizes a very large context compared to

previous character recognition methods with only a

single recursive layer (Kim et al., 2016) and few pa-

rameters since adding another layer increase the num-

ber of parameters (Eigen et al., 2013). The skip-

Preprocessing

de-skewing, gray-scale

conversion, binarization

Segmentation

Feature Extraction Classifier

Post-processing

dictionary correction,

grammar rule

Raw Images

Refines

Images

Components

Feature

Vectors

Converted

Text

Corrected

Text

Document Text

.txt

Model

Model base

Figure 2: Overview of the generic OCR process: (Meshesha

and Jawahar, 2007). The process of using an OCR system,

in general, can be broken down into four key steps: The

ﬁrst phase is image prepossessing, which involves a wide

range of imaging functions such as image rotation, binariza-

tion, and de-skewing to improve image quality. The second

document analysis process deﬁnes the text recognition ar-

eas and provides data on the layout and formatting aspects

of each page, as well as the document’s overall structure. At

the recognition stage, the actual texts are predicted. In the

post-processing stage, the OCR errors are repaired and the

model is updated.

connection has two advantages. First, the network

capacity to store the input signal during recursions

is saved. Second, the exact copy of the input sig-

nal can be used during target prediction (Kim et al.,

2016). The proposed method demonstrates state-of-

the-art performance in common benchmarks.

The rest of the paper is organized as follows. Re-

lated works are reviewed in section 2. In section 3, the

proposed system architecture and detail of datasets

are presented. Section 4 presents experimental results

and ﬁnally, conclusions are presented in section 5.

2 RELATED WORKS

Pre-processing, segmentation, feature extraction, and

classiﬁcation are the generic processes that charac-

ter recognition entails. While each stage affects

recognition accuracy, the feature extraction technique

(Gondere et al., 2019) plays the most important in-

ﬂuence. Layout analysis and text line extraction are

the ﬁrst steps in analyzing a document image. For

each line, the text is divided into distinct character

pictures. Finally, the classiﬁer receives these charac-

ter pictures and generates class labels. An overview of

the OCR entire process is illustrated in 2. This generic

OCR process is proposed by (Meshesha and Jawahar,

2007). Following such generic OCR process performs

better for well-printed or well-written manuscripts.

For a long time, the document analysis community

has been focused on automating reliable document

RRConvNet: Recursive-residual Network for Real-life Character Image Recognition

111

image recognition and information extraction meth-

ods (Younas et al., 2017). In contrast to Latin and

Asian scripts, OCR research for low-resource scripts

such as Ethiopic script is still lacking(Belay et al.,

2019b; Assabie and Bigun, 2007; Cowell and Hus-

sain, 2003). Various methodologies have been used

to build OCR methods for a variety of scripts, with

ground-breaking results. A CNN-based handwritten

Bangala character recognition system has been pro-

posed by (Rahman et al., 2015), which normalizes

written character images before using CNN to classify

individual characters, with a recognition accuracy of

85.36 percent on a dataset of 20000 characters.

Mars and Antoniadis (Mars and Antoniadis, 2016)

presented a model for Optical Character Recogni-

tion (OCR) in the Telugu language, which includes

three parts: a database of Telugu characters, a deep

learning-based OCR algorithm, and an online client-

server application for the developed algorithm. Their

model is based on Convolutional Neural Networks

(CNNs) algorithm reasonably to classify the charac-

ters. They have applied their OCR system to real data

and the results were good. A hierarchical fuzzy con-

volutional neural network (HFCNN) (Chaudhuri and

Ghosh, 2017) is used for the Czech language char-

acter recognition task. It takes full advantage of deep

CNN towards modeling long-term information of data

sequences located on the database that contains un-

constrained handwritten text at a resolution of 300 dpi

as PNG images with 256 gray levels.

The ﬁrst work for Amharic script recognition is

proposed by Alemu (Alemu, 1997). Alemu devel-

oped an algorithm based on the laser printouts of text

with normal type style of WashRa font, 12-point font

sizes, and reported 97.31% of character recognition

accuracy. Later, Yaregal Assabie (Yaregal, 2002) ex-

plored various OCR development approaches to de-

velop an OCR model for Amharic script and come up

with a versatile algorithm that is independent of the

Amharic characters’ font size. The system correctly

recognized 73.18% of the characters included in the

training set.

Wondwossen (Mulugeta, 2004) developed an

OCR model for a special type of handwritten Amharic

text (“Yekum Tsifet”) using a neural network ap-

proach. The results reported in this work are 95.96%

for segmentation rate and 98.8% to 20.3% for recog-

nition. An enhanced optical character recognition for

real-life Amharic degraded documents have been de-

veloped by (Birhanu, 2008) using an Artiﬁcial Neural

Network (ANN) approach for classifying the features

generated. Accordingly, an average recognition rate

of 96.87% for the test sets from the training sets and

11.40% recognition rate is observed for the new test

sets. So far, there are very limited research efforts

made for Ethiopic character recognition and there is

no effective OCR application. The possible reasons

mentioned are the use of a large number of charac-

ters in the writing, the existence of a large set of

visually similar characters, variations in font, style,

and writing materials, and the unavailability of stan-

dard dataset (Assabie and Bigun, 2011; Meshesha and

Jawahar, 2007; Belay et al., 2019a).

The prior studies focused on creating a classiﬁer

that can handle character image sets with limited type

and known fonts and has been focused on the stan-

dard deep neural network layers with a large num-

ber of parameters. In comparison to other character

recognition methods, we developed an adaptive recur-

sive CNN method that uses recursive convolution and

a skip connection to reduce the number of parame-

ters, save network capacity for storing inputs during

recursions and enable feature reusability.

3 MATERIAL AND METHODS

A general OCR system includes the basic steps shown

in Figure 2 (Meshesha and Jawahar, 2007) that start

from preparing a dataset followed by training the

model. In this section, the nature of the dataset used

for training and model evaluation, the proposed algo-

rithm, and the training schemes that we’ve followed

throughout the research are presented.

3.1 Datasets

This section describes the datasets we utilize for our

experiments. To train and test the proposed model,

two different datasets are used. The ﬁrst dataset is

the ADOCR database

, a public and freely avail-

able dataset, which contains about 77,994 Amharic

character images. Out of 77,994 images, 7800 of

10% of them are used for the testing, and the remain-

ing 70,194 for training purposes. The second dataset

consists of 10470 character images that are collected

from various private sources and it includes histori-

cal hand-written and printed characters with differ-

ent and also unknown font types. Once we collect

these character images, we manually labeled each of

them. Then we consider randomly selected 10% of

the dataset as a test set and the remaining character

images for training purposes. Few sample character

images taken from the database are shown in Figure

http://www.dfki.uni-kl.de/ belay/

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

112

a b

Figure 3: Sample segmented and binary Ethiopic character

images. (a) Real-life Ethiopic character image. (b) Synthet-

ically generated character images from ADOCR database

3.2 Proposed Algorithm

The overall framework of the proposed approach is

shown in Figure 4. CNNs have the main feature of

sharing inner parameters across the network, which

leads to architectural properties of scale, shift, and

distortion invariance, making them a powerful tool for

image feature extraction with few preprocessing steps

(Goodfellow et al., 2016). Those properties mean

that regardless of where and how a speciﬁc raw fea-

ture appears in the image, a suitable and well-trained

CNN can capture that feature. After feature extrac-

tion has been practiced, images can be classiﬁed, seg-

mented, or even reconstructed. CNNs are formulated

as a feature extraction block and a classiﬁcation block

(Fig 4). The initial block receives a grid-like topol-

ogy input and hierarchically extracts representative

features followed by another block responsible for re-

ceiving the top hierarchical feature and providing a

ﬁnal matrix of prediction.

Input Character

(image)

Output character

(Text)

Max-Pooling

FC-layers

Conv_layers

Skip-connection

Figure 4: The proposed Recursive-ConvNet architecture

showing the features extraction and character recognition

zones together with main notations where R=3. The

, de-

notes the concatenation operation as used in (Huang et al.,

2017).

3.2.1 Recursive-ConvNet

Consider a character recognition problem in which

the aim is to associate an input feature x, character

image, with an output y, Unicode character, through

the neural network function f(x). This neural network

model is trained using Adam optimizer (Kingma and

Ba, 2014) to minimize a loss function L over a char-

acter image dataset D. This network architecture is

deﬁned using a convolutional layer C, parameterized

by weights W. To build a deep neural network, called

Recursive-residual ConveNet, we use a single convo-

lution layer, as proposed in (Kim et al., 2016), which

is iteratively applied R times on successive steps. The

convolution layer in the proposed model consists of

a kernel size of 3 × 3, 32 ﬁlters and same padding.

This network architecture, RecurssivConvNet, is then

deﬁned by the following recursive sequence:











= 32 × 32

t+1

= C(x

) for t=0,1,..., R

b = Add()[x

, x

pool

= maxpooling(b)

y = FC(b

pool

)

(1)

where x

denotes the input character image, x

t+1

feature maps after passing a convolution layer C. b is

the concatenated value of the feature map from the

and input image that denotes the skip connection

in the network, x

is the feature map at the end of

the whole iteration (R), and FC is the fully connected

network layer.

In our proposed architecture, the RecursiveCon-

vNet layer iteratively performs a convolution opera-

tion R times by receiving a character image as input

where R=3 and adopted from (Kim et al., 2016). Fol-

lowed by the input image tensor which is directly fed

into the reconstruction net whenever it is used during

the recursions. In this case, the skip-connection has

two advantages. First, the network capacity to store

the input features during recursions is saved. Second,

the exact copy of the input features map of the charac-

ter image can be used during target character recog-

nition which is usually called feature reusability. In

addition, this skip connection can be used as an alter-

native path during back-propagation where the gra-

dient is too small; thus, a vanishing gradient might

not be an issue during training our network. The

concatenated features are then passed through a max-

pooling layer. Before passing the features from the

max-pooling layer to the recognition phase, a similar

convolution operation is applied to it.

In this case, pooling has two effects. First, it di-

minishes the number of computations made by one

iteration which signiﬁcantly increases the speed of

a forward pass in the network model training. Sec-

ond, it allows the convolutional ﬁlter to take effect in

larger regions of the initial character images. Finally,

the feature maps from the max-pooling layer are ﬂat-

tened and fed into the Fully connected (FC) layer to

compute the probability distribution over each class,

where this probability is computed using, equation

(2), soft-max activation function. The idea behind the

RRConvNet: Recursive-residual Network for Real-life Character Image Recognition

113

recursiveness, in this network architecture, is the en-

hancement of outputs quality by considering the pre-

viously simulated information.

f (z)

∑

i=1

(2)

where z is the input vector, e

denotes the standard

exponential function for input vector, K is number of

classes, and e

is the standard exponential function

for output vector.

The recognition loss of the this network architec-

ture is categorical cross-entropy loss function and can

be computed as,

L = −

∑

log( ˆy

) (3)

where ˆy

is thei-th scalar value in the model output. y

is the corresponding ground-truth value, C is the class

of the sample.

The overall character recognition accuracy of the

model is computed as a ratio of incorrectly recognized

characters and the total number of characters in the

test dataset.

4 EXPERIMENTAL RESULTS

Our methodology is enforced in Keras Application

Program Interface(API) with a TensorFlow backend.

In addition, we resized the images into a size of

a b

Figure 5: Learning curve. (a) Training and validation accu-

racy of real-life character images, (b) training and validation

accuracy using the ADOCR dataset.

32 × 32 pixel. The architecture is trained with a batch

size of 16 running for 15 epochs. To select suitable

network parameters, different values of these param-

eters were considered and tuned during experimenta-

tion, and the results reported in this paper are obtained

using an Adam optimizer employing a convolutional

neural network with a feature map of 64, kernel-size

of 3 × 3, and stride of 2. This CNN layer is called

and convolves, over the input image, three times re-

cursively. The input image tensor is concatenated,

through skip connection, with the output tensor of the

CNN layer and passes through a max-pooling layer

that has a 2× 2 kernel size followed by two fully con-

nected layers having 512 and 1024 neurons respec-

tively.

Since we have two different datasets, we con-

ducted two experiments and learning curve of each

experiment is shown in Figure 5. In the ﬁrst ex-

periment, the model was trained and evaluated with

28 × 28 synthetic character image from the ADOCR

dataset having 231 unique characters. The second ex-

periment was conducted using real-life character im-

ages that consist of 319 unique Ethiopic characters

and digits written with different and also unknown

fonts.

Our network is trained using the different batch

sizes, epochs and network setups. The best test re-

sult is recorded with in batch size of 16 running for

15 epochs. The character recognition accuracy of our

model is calculated as a ratio of correctly recognized

characters and the total number of characters in the

test set and then multiply by 100 ( see equation (4)).

A =

#correctly recognized characters

# characters in the test set

× 100 (4)

4.1 Performance Comparison

Based on the results recorded during experimentation,

98.2 % of real-life character images and 94.75% of

test images from the ADOCR database are correctly

predicted respectively. Compared to the real-life

character images, Character images in the ADOCR

dataset are highly degraded and even there are some

deformed character images as illustrated in ﬁgure 7.

Due to this the character recognition accuracy on the

ADOCR test dataset is much lower than that of the

recognition accuracy on the real-life test sets. As it is

observed in Table 1, we have achieved better perfor-

mance compared with works done on Amharic OCR

using 231 classes. Others’ work is presented in Ta-

ble 1, here, it is not to compare the performance di-

rectly since they used different datasets and experi-

mental settings. However, it’s simply to indicate the

progress of OCR for the Amharic script.

where A denotes accuracy.

As illustrated in Figure 6, some characters are in-

correctly recognized due to similarity (e.g Ø as a)

while the others are incorrectly recognized (e.g € as

e) even with no similarity between them. While the

others are miss-recognized due to the quality of the

image. Some of these deformed character image are

show in Figure 7.

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

114

Table 1: Performance of prior works and proposed method.

Authors #Dataset Type Accuracy

(Yaregal, 2002) 1010 handwritten 73.18%

(Meshesha and Jawahar, 2007) 76800 printed 90.37%

(Belay et al., 2018) 80,000 synthetic 92.71%

(Belay et al., 2019a)* 77,994 Synthetic 93.45%

Ours* 77,994 Synthetic 94.75%

Ours 10470 real-life 98.2%

Denotes methods tested on similar datasets.

ኤ

sample test Image

Ground-truth Prediction

ኢ

በ ዐ

ፀ ዐ

ጦ ሖ

ገ ን

ሀ ቐ

Figure 6: A typical diagram that shows the sample miss-

recognized historical characters images.

ሔ ኡ

ቱ

ቃ

Ground-truth

Image

Prediction

ጩ

ጌ

Figure 7: Deformed characters, from the ADOCR test set

that are wrongly recognized.

5 CONCLUSIONS

In this paper, we have introduced an adaptable

Recursive-CNN-based method, for real-life Ethiopic

character image recognition. This method is a

lightweight architecture with few network parameters

and limited network capacity to store input features

that can be easily adapted to other image-based pat-

tern recognition problems. The proposed method is

designed based on the existing VGGnet-like CNN ar-

chitecture with two extensions; the recursive convolu-

tion where a single Convolutional layer recursively is

convolved through the input character image and one

pooling layer followed by two fully connected lay-

ers. Second, is the Skip-connection where the input

feature maps are concatenated with the output feature

maps of a character image. We evaluated our RRCon-

vNet model on a publicly available dataset which has

77994 sample Amharic character images and 10470

real-life Ethiopic character images. Experiments on

these datasets have shown that our model achieves

high character recognition accuracy. Our proposed

method minimizes the number of network parameters

and allows feature reusability. As part of future work,

the proposed network architecture can be extended to

text-line level image recognition tasks, as a feature

extractor, by integrating with the recurrent neural net-

works.

ACKNOWLEDGEMENTS

This research work was partially supported by the

ICT4D research center (annually research grant),

Bahir Dar Institute of Technology, Bahir Dar Univer-

sity

REFERENCES

Alemu, W. (1997). The application of ocr techniques to the

amharic script. An MSc thesis at Addis Ababa Univer-

sity Faculty of Informatics.

Assabie, Y. and Bigun, J. (2007). A neural network ap-

proach for multifont and size-independent recognition

of ethiopic characters. Advances in Pattern Recogni-

tion, pages 129–137.

Assabie, Y. and Bigun, J. (2011). Ofﬂine handwritten

amharic word recognition. Pattern Recognition Let-

ters, 32(8):1089–1099.

Bai, J., Chen, Z., Feng, B., and Xu, B. (2014). Image

character recognition using deep convolutional neu-

ral network learned from different languages. In 2014

IEEE International Conference on Image Processing

(ICIP), pages 2560–2564. IEEE.

RRConvNet: Recursive-residual Network for Real-life Character Image Recognition

115

Belay, B., Habtegebrial, T., Liwicki, M., Belay, G., and

Stricker, D. (2019a). Factored convolutional neural

network for amharic character image recognition. In

2019 IEEE International Conference on Image Pro-

cessing (ICIP), pages 2906–2910. IEEE.

Belay, B., Habtegebrial, T., and Stricker, D. (2018).

Amharic character image recognition. In 2018 IEEE

18th International Conference on Communication

Technology (ICCT), pages 1179–1182. IEEE.

Belay, B. H., Habtegebirial, T., Liwicki, M., Belay, G., and

Stricker, D. (2019b). Amharic text image recognition:

Database, algorithm, and analysis. In 2019 Interna-

tional Conference on Document Analysis and Recog-

nition (ICDAR), pages 1268–1273. IEEE.

Birhanu, A. T. (2008). Amharic Character Recognition Sys-

tem for Printed Real-Life Documents. PhD thesis, Ad-

dis Ababa University.

Bora, M. B., Daimary, D., Amitab, K., and Kandar, D.

(2020). Handwritten character recognition from im-

ages using cnn-ecoc. Procedia Computer Science,

167:2403–2409.

Chaudhuri, A. and Ghosh, S. K. (2017). Optical charac-

ter recognition system for czech language using hier-

archical deep learning networks. In Proceedings of

the Computational Methods in Systems and Software,

pages 114–125. Springer.

Cowell, J. and Hussain, F. (2003). Amharic character recog-

nition using a fast signature based algorithm. In Pro-

ceedings on Seventh International Conference on In-

formation Visualization, 2003. IV 2003., pages 384–

389. IEEE.

Dai, Z. and Heckel, R. (2019). Channel normalization in

convolutional neural network avoids vanishing gradi-

ents. arXiv preprint arXiv:1907.09539.

Eigen, D., Rolfe, J., Fergus, R., and LeCun, Y. (2013). Un-

derstanding deep architectures using a recursive con-

volutional network. arXiv preprint arXiv:1312.1847.

Elleuch, M., Tagougui, N., and Kherallah, M. (2016). A

novel architecture of cnn based on svm classiﬁer for

recognising arabic handwritten script. International

Journal of Intelligent Systems Technologies and Ap-

plications, 15(4):323–340.

Gondere, M. S., Schmidt-Thieme, L., Boltena, A. S., and

Jomaa, H. S. (2019). Handwritten amharic charac-

ter recognition using a convolutional neural network.

arXiv preprint arXiv:1909.12943.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep

Learning. MIT Press. http://www.deeplearningbook.

org.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,

K. Q. (2017). Densely connected convolutional net-

works. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 4700–

4708.

Kim, J., Lee, J. K., and Lee, K. M. (2016). Deeply-recursive

convolutional network for image super-resolution. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 1637–1645.

Kingma, D. P. and Ba, J. (2014). Adam: A

method for stochastic optimization. arXiv preprint

arXiv:1412.6980.

Mars, A. and Antoniadis, G. (2016). Arabic online hand-

writing recognition using neural network. Interna-

tional Journal of Artiﬁcial Intelligence and Applica-

tions (IJAIA), 7(5).

Meshesha, M. and Jawahar, C. (2007). Optical character

recognition of amharic documents. African Journal of

Information & Communication Technology, 3(2).

Mulugeta, W. (2004). Ocr for special type of handwritten

amharic text. Yekum Tsifet”), Neural Network Ap-

proach.

Rahman, M. M., Akhand, M., Islam, S., Shill, P. C., and

Rahman, M. H. (2015). Bangla handwritten character

recognition using convolutional neural network. Inter-

national Journal of Image, Graphics and Signal Pro-

cessing, 7(8):42.

Tan, H. H. and Lim, K. H. (2019). Vanishing gradient mit-

igation with deep learning neural network optimiza-

tion. In 2019 7th international conference on smart

computing & communications (ICSCC), pages 1–4.

IEEE.

Yaregal, A. (2002). Optical character recognition of

amharic text: an integrated approach. School of In-

formation Studies for Africal. Addis Ababa University.

Addis Ababa.

Younas, J., Afzal, M. Z., Malik, M. I., Shafait, F., Lukowicz,

P., and Ahmed, S. (2017). D-star: A generic method

for stamp segmentation from document images. In

2017 14th IAPR International Conference on Docu-

ment Analysis and Recognition (ICDAR), volume 1,

pages 248–253. IEEE.

DeLTA 2022 - 3rd International Conference on Deep Learning Theory and Applications

116