Authors:
Taeka Awazu
;
Manami Fukuo
;
Masami Takata
and
Kazuki Joe
Affiliation:
Nara Women's University, Japan
Keyword(s):
Character Recognition, Character Clipping, Genetic Programing, Early-modern Japanese Printed Books.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Character Recognition
;
Classification
;
Evolutionary Computation
;
Pattern Recognition
;
Software Engineering
;
Theory and Methods
Abstract:
The web site of National Diet Library in Japan provides a lot of early-modern (AD1868-1945) Japanese printed books to the public, but full-text search is essentially impossible. In order to perform advanced search for historical literatures, the automatic textualization of the images is required. However, the ruby system, which is peculiar to Japanese books, gives a serious obstacle against the textualization. When we apply existing OCRs to early-modern Japanese printed books, the recognition rate is extremely low. To solve this problem, we have already proposed a multi-font Kanji character recognition method using the PDC feature and an SVM. In this paper, we propose a ruby character removal method for early-modern Japanese printed books using genetic programming, and evaluate our multi-fonts Kanji character recognition method with 1,000 types of early-modern Japanese printed Kanji characters.