Authors:
Birhanu Belay
1
;
2
;
Tewodros Habtegebrial
2
;
Gebeyehu Belay
1
and
Didier Stricker
3
;
2
Affiliations:
1
Bahir Dar Institute of Technology, Bahir Dar, Ethiopia
;
2
Technical University of Kaiserslautern, Kaiserslautern, Germany
;
3
DFKI, Augmented Vision Department, Kaiserslautern, Germany
Keyword(s):
Amharic Document Image, Automatic Feature, Binary SVM, CNN, Handwritten, Machine Printed, OCR, Pattern Recognition.
Abstract:
In many documents, ranging from historical to modern archived documents, handwritten and machine printed texts may coexist in the same document image, raising significant issues within the recognition process and affects the performance of OCR application. It is, therefore, necessary to discriminate the two types of texts so that it becomes possible to apply the desired recognition techniques. Inspired by the recent successes CNN based features on pattern recognition, in this paper, we propose a method that can discriminate handwritten from machine printed text-lines in Amharic document image. In addition, we also demonstrate the effect of replacing the last fully connected layer with a binary support vector machine which minimizes a margin-based loss instead of the cross-entropy loss. Based on the results observed during experimentation, using Binary SVM gives significant discrimination performance compared to the fully connected layers.