Japanese Cursive Character Recognition for Efficient Transcription

Kazuya Ueki, Tomoka Kojima

Abstract

We conducted detailed experiments of Japanese cursive character recognition to promote Japanese historical document transcription and digitization by using a publicly available kuzushiji dataset released by the Center for Open Data in the Humanities (CODH). Using deep learning, we analyzed the causes of recognition difficulties through a recognition experiment of over 1,500-class of kuzushiji characters. Furthermore, assuming actual transcription conditions, we introduced a method to automatically determine which characters should be held for judgment by identifying difficult-to-recognize characters or characters that were not used during training. As a result, we confirmed that a classification rate of more than 90% could be achieved by narrowing down the characters to be classified even when a recognition model with a classification rate of 73.10% was used. This function could improve transcribers’ ability to judge correctness from context in the post-process—namely, the previous and subsequent characters.

Download


Paper Citation


in Harvard Style

Ueki K. and Kojima T. (2020). Japanese Cursive Character Recognition for Efficient Transcription.In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-397-1, pages 402-406. DOI: 10.5220/0008913204020406


in Bibtex Style

@conference{icpram20,
author={Kazuya Ueki and Tomoka Kojima},
title={Japanese Cursive Character Recognition for Efficient Transcription},
booktitle={Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2020},
pages={402-406},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008913204020406},
isbn={978-989-758-397-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Japanese Cursive Character Recognition for Efficient Transcription
SN - 978-989-758-397-1
AU - Ueki K.
AU - Kojima T.
PY - 2020
SP - 402
EP - 406
DO - 10.5220/0008913204020406